Rclone

Rclone is a program for retrieving and uploading data from a variety of services. It is most useful for interacting with object (o3://) and amazon or google bucket (s3:// or gs://) storage. It is, however, capable of downloading/uploading from a wide variety of sources, including http, ftp, dropbox.

Quickstart

Our primary use of Rclone is to copy from local data storage to remote locations. Once rclone has been configured, you can copy data from a remote source to a local location by

rclone copy -P /${LOCAL_DIRECTORY} ${SOURCE}:${BUCKET_NAME}/${REMOTE_DIRECTORY}

or from remote source to local with the reverse

rclone copy -P ${SOURCE}:${BUCKET_NAME}/${REMOTE_DIRECTORY} /${LOCAL_DIRECTORY}

replacing the variables between ${} with actual values. In this instance “local” means a computer that you are logged into (the actual computer below your desk or a terminal that is directly interacting with a cluster) and the -P tells Rclone to show progress.

So, for instance if you wanted to copy the directory tax_documents from the home folder of your local computer to a bucket named oscar on a remote you’ve given the nickname google in the rclone.config file described below, you would use the command

rclone copy -P ~/tax_documents google:oscar/tax_documents

and note that when you copy a directory, you’re actually copying the contents of the directory, so you need to specify the target directory (things in ~/tax_documents go in the google:oscar/tar_documents directory)

Setup

Installation

Rclone can be installed

If you are looking to run rclone on Walnut, instead load the module using:

module load rclone

Configuration

If you are going to be using rclone much, it is worthwhile to create a config file. You can either run the command rclone config and answer the prompts (most of which can be left at their default values), or you can directly create the file with whatever text editor you choose (though, the config file must be saved as a plain text TOML file. However, if you will just be accessing our Google bucket and OMRF object storage, a configuration file with the appropriate values can be found at /Volumes/guth_aci_informatics/software/rclone.conf.

The rclone.conf file should be placed in:

  • Windows 10/11: %homepath%\.config\rclone\rclone.conf

  • Linux variants: $HOME/.config/rclone/rclone.conf

  • MacOS: $HOME/.rclone.conf

If you wish to directly create the config file, it should look like:

example rclone.conf
 1[{{NICKNAME}}]
 2type = swift
 3env_auth = false
 4user = {{OMRF_USERNAME}}
 5key = {{OMRF_PASSWORD}}
 6auth = https://o3.omrf.org/auth/v2.0
 7tenant = {{TENANT_NAME}}
 8endpoint_type = public
 9
10[amazon]
11type = s3
12provider = AWS
13env_auth = true
14region = us-east-1
15
16[gcloud]
17type = google cloud storage
18project_number = {{GCLOUD_PROJECT}}
19service_account_file = {{GCLOUD_STORAGE_KEY}}
20location = {{GCLOUD_STORAGE_REGION}}
21object_acl = bucketOwnerFullControl
22bucket_acl = authenticatedRead
23bucket_policy_only = true

The value for NICKNAME does not need to match anything in particular, it is just a name that you assign to that source and use whenever accessing it in the commands. For example,

example rclone command
rclone copy -P NICKNAME:home/pictures OTHER_DEST_NICKNAME:home/pictures

The same is true for amazon and gcloud above.

Object storage

See the Object storage section in the Local computing resources page for more information.

Google cloud

To setup Google cloud storage, you will need a few pieces of information. Namely:

  • project_id: At the moment, we just make use of one project, Guthridge-NIH-STRIDES-Projects This is also often used in its lowercase form, mostly in commandline instances such as in the Rclone config.

  • storage access key: Follow the link for instructions on how to retreive a storage access key. Currently, there is one placed in /Volumes/guth_aci_informatics/software/guthridge-nih-strides-projects-storage-key.json on Walnut (or \\qlotsam\guth_aci_informaticssoftware/guthridge-nih-strides-projects-storage-key.json in Windows)

  • bucket region: see the documentation for Regions and zone. All of our resources should be located in us-central1 (i.e. located in Iowa)

Using without a config file

If you will be using a particular source only very infrequently, you can access any of the object storage “tenants” with the following, replacing the bracketed variables with their respective values:

rclone \
  --swift-tenant "{{TENANT}}" \
  --swift-auth "https://o3.omrf.org/auth/v2.0" \
  --swift-user "{{OMRF_USER_NAME}}" \
  --swift-key "{{OMRF_PASSWORD}}" \
  {{COMMAND}} \
  :swift:

Note that the :swift: in this case is both the name of the remote and the remote type. To reference files and folders in this tenant, place their name directly after the colon, i.e. :swift:PrecisionMed/analysis/rnaseq/blast

Simlarly, one can use rclone to access an http source without configuration instead of using something like curl or wget. For example:

rclone copy -P --http-url https://stuff.online/files :http: ./

will download files to the present directory.

For the possible commands, see the website, but likely you will use one of the following:

  • lsf - list files

  • lsd - list directories

  • copy - copy from SOURCE to DESTINATION. This will overwrite files in DESTINATION if there is a newer version in SOURCE

  • move - same

  • delete - WARNING Do NOT use this unless you are absolutely sure. You cannot recover the files.

  • sync - synchronize the contents in DESTINATION with those in SOURCE. Unlike copy, this will overwrite any existing files in DESTINATION and delete any that are not present in SOURCE

Important

NOTE that rclone is a little odd in that it will copy all of the contents of a directory, but not the directory itself! This means that if you run the command

rclone copy -P source:home/pictures/ destination:home/

all of the files in the source picures subdirectory would be copied into home itself. You need to include the destination directory as well:

rclone copy -P source:home/pictures/ destination:home/pictures

Useful Parameters

There are several command arguments that can be very useful.

  • -P: print live progress

  • --include="PATTERN": this will restrict copying/moving/deleting to a subset of files that match the glob pattern. Include the pattern inside of quotes. For example, to only copy bam files: rclone copy -P source:directory/ dest:directory --include="*.bam" Look up “glob pattern” for more info.

  • --exclude="PATTERN": copy/move/delete everything EXCEPT files that match the pattern.