Rclone
Rclone is a program for retrieving and uploading data from a variety of services. It is most useful
for interacting with object (o3://) and amazon or google bucket (s3:// or gs://)
storage. It is, however, capable of downloading/uploading from a wide variety of sources, including
http, ftp, dropbox.
Quickstart
Our primary use of Rclone is to copy from local data storage to remote locations. Once rclone has been configured, you can copy data from a remote source to a local location by
rclone copy -P /${LOCAL_DIRECTORY} ${SOURCE}:${BUCKET_NAME}/${REMOTE_DIRECTORY}
or from remote source to local with the reverse
rclone copy -P ${SOURCE}:${BUCKET_NAME}/${REMOTE_DIRECTORY} /${LOCAL_DIRECTORY}
replacing the variables between ${} with actual values. In this instance “local” means a computer that you are
logged into (the actual computer below your desk or a terminal that is directly interacting with a cluster) and the
-P tells Rclone to show progress.
So, for instance if you wanted to copy the directory tax_documents from the home folder of your local computer to a
bucket named oscar on a remote you’ve given the nickname google in the rclone.config
file described below, you would use the command
rclone copy -P ~/tax_documents google:oscar/tax_documents
and note that when you copy a directory, you’re actually copying the contents of the directory, so you need to specify
the target directory (things in ~/tax_documents go in the google:oscar/tar_documents directory)
Setup
Installation
Rclone can be installed
from the rclone website.
using apt, brew, dnf, or whatever software manager comes with your operating system
If you are looking to run rclone on Walnut, instead load the module using:
module load rclone
Configuration
If you are going to be using rclone much, it is worthwhile to create a config file. You can either run the command
rclone config and answer the prompts (most of which can be left at their default values), or you can directly
create the file with whatever text editor you choose (though, the config file must be saved as a plain
text TOML file. However, if you will just be accessing our Google bucket and OMRF object
storage, a configuration file with the appropriate values can be found at
/Volumes/guth_aci_informatics/software/rclone.conf.
The rclone.conf file should be placed in:
Windows 10/11:
%homepath%\.config\rclone\rclone.confLinux variants:
$HOME/.config/rclone/rclone.confMacOS:
$HOME/.rclone.conf
If you wish to directly create the config file, it should look like:
1[{{NICKNAME}}]
2type = swift
3env_auth = false
4user = {{OMRF_USERNAME}}
5key = {{OMRF_PASSWORD}}
6auth = https://o3.omrf.org/auth/v2.0
7tenant = {{TENANT_NAME}}
8endpoint_type = public
9
10[amazon]
11type = s3
12provider = AWS
13env_auth = true
14region = us-east-1
15
16[gcloud]
17type = google cloud storage
18project_number = {{GCLOUD_PROJECT}}
19service_account_file = {{GCLOUD_STORAGE_KEY}}
20location = {{GCLOUD_STORAGE_REGION}}
21object_acl = bucketOwnerFullControl
22bucket_acl = authenticatedRead
23bucket_policy_only = true
The value for NICKNAME does not need to match anything in particular, it is just a name that you
assign to that source and use whenever accessing it in the commands. For example,
rclone copy -P NICKNAME:home/pictures OTHER_DEST_NICKNAME:home/pictures
The same is true for amazon and gcloud above.
Object storage
See the Object storage section in the Local computing resources page for more information.
Google cloud
To setup Google cloud storage, you will need a few pieces of information. Namely:
project_id: At the moment, we just make use of one project, Guthridge-NIH-STRIDES-Projects This is also often used in its lowercase form, mostly in commandline instances such as in the Rclone config.
storage access key: Follow the link for instructions on how to retreive a storage access key. Currently, there is one placed in
/Volumes/guth_aci_informatics/software/guthridge-nih-strides-projects-storage-key.jsonon Walnut (or\\qlotsam\guth_aci_informaticssoftware/guthridge-nih-strides-projects-storage-key.jsonin Windows)bucket region: see the documentation for Regions and zone. All of our resources should be located in
us-central1(i.e. located in Iowa)
Using without a config file
If you will be using a particular source only very infrequently, you can access any of the object storage “tenants” with the following, replacing the bracketed variables with their respective values:
rclone \
--swift-tenant "{{TENANT}}" \
--swift-auth "https://o3.omrf.org/auth/v2.0" \
--swift-user "{{OMRF_USER_NAME}}" \
--swift-key "{{OMRF_PASSWORD}}" \
{{COMMAND}} \
:swift:
Note that the :swift: in this case is both the name of the remote
and the remote type. To reference files and folders in this tenant,
place their name directly after the colon,
i.e. :swift:PrecisionMed/analysis/rnaseq/blast
Simlarly, one can use rclone to access an http source without configuration instead of using something like curl or wget. For example:
rclone copy -P --http-url https://stuff.online/files :http: ./
will download files to the present directory.
For the possible commands, see the website, but likely you will use one of the following:
lsf- list fileslsd- list directoriescopy- copy fromSOURCEtoDESTINATION. This will overwrite files inDESTINATIONif there is a newer version inSOURCEmove- samedelete- WARNING Do NOT use this unless you are absolutely sure. You cannot recover the files.sync- synchronize the contents inDESTINATIONwith those inSOURCE. Unlike copy, this will overwrite any existing files inDESTINATIONand delete any that are not present inSOURCE
Important
NOTE that rclone is a little odd in that it will copy all of the contents of a directory, but not the directory itself! This means that if you run the command
rclone copy -P source:home/pictures/ destination:home/
all of the files in the source picures subdirectory would be copied
into home itself. You need to include the destination directory as
well:
rclone copy -P source:home/pictures/ destination:home/pictures
Useful Parameters
There are several command arguments that can be very useful.
-P: print live progress--include="PATTERN": this will restrict copying/moving/deleting to a subset of files that match the glob pattern. Include the pattern inside of quotes. For example, to only copy bam files:rclone copy -P source:directory/ dest:directory --include="*.bam"Look up “glob pattern” for more info.--exclude="PATTERN": copy/move/delete everything EXCEPT files that match the pattern.