Transfer¶
scp¶
You can transfer files with the scp
command. The first argument is the source file while the second argument indicates the target location. For example, you can copy a file from your computer to the data
directory on the cluster by running the following command on your computer.
scp my_local_file.txt <shortname>@cluster.s3it.uzh.ch:data
To copy a file from the cluster, you specify the server and the remote path as the first argument and local path as the second. For example, you can copy job_results.txt
that resides in your scratch
directory on the cluster to your computer by running the following command on your computer.
scp <shortname>@cluster.s3it.uzh.ch:/scratch/<shortname>/job_results.txt .
The .
(i.e., "dot") character stands for the current directory. You can specify any other location either with an absolute path or path that is relative to your current directory.
As well, you can transfer the whole directory using an -r
flag.
scp -r my/local/dir <shortname>@cluster.s3it.uzh.ch:scratch/target
rsync¶
However, for transfers that involve many files or directories, it is often more efficient to use rsync
. This program synchronises files between the source and destination. Thus, if your transfer fails or if only some of your files have been updated, rsync
would be more efficient as it does not transfer the identical data present in both locations. For example, the following command can be used in place of the previous scp
command.
rsync -az --progress my/local/dir <shortname>@cluster.s3it.uzh.ch:scratch/target
As with scp
, the first location is the source file/directory while the second is the target location. The -a
flag invokes the archive mode that, roughly speaking, recreates the structure and permissions of the source directory on the target machine. The -z
flag instructs rsync
to compress the data before the transfer, which can make the transfer faster especially when your connection speed is low. As the name suggests, the --progress
option would show the transfer progress information.
Before running the synchronisation, you can run the command with -n
to preview which files will be transferred. It is necessary to specify --progress
in this case. Otherwise, rsync
will not display any output.
rsync -azn --progress my/local/dir <shortname>@cluster.s3it.uzh.ch:scratch/target
You can exclude files and directories from synchronisation with --exclude
. This parameter can be specified multiple times. For example, the following command will ignore all files and directories named cache
as well as all files that have .tmp
extension.
rsync -azn --progress --exclude='cache' --exclude='*.tmp' my/local/dir <shortname>@cluster.s3it.uzh.ch:scratch/target
By default, rsync
does not remove any local files even if they have been deleted from the source directory. The deletion of old files can be enabled with --delete
. It is strongly recommended to preview the changes with -n
before running rsync
with the --delete
flag. If you specify the wrong target directory, all files in that directory will be deleted without confirmation.
rsync -az --progress --delete <shortname>@cluster.s3it.uzh.ch:scratch/source my/local/target
Trailing slash at the end of the source directory instructs rsync
to synchronise the contents of the source directory rather than the directory itself. Let us suppose, for example, that the source directory scratch/data
has one single file test.txt
. If you do not specify the trailing slash (i.e., /
), rsync
will create data
directory in your local directory and transfer the contents there.
rsync -az <shortname>@cluster.s3it.uzh.ch:scratch/data my/local/target
ls my/local/target
# data
ls my/local/target/data
# test.txt
If you add the trailing slash /
, rsync
will place test.txt
directly into your target directory.
rsync -az <shortname>@cluster.s3it.uzh.ch:scratch/data/ my/local/target
ls my/local/target
# test.txt
Sharing data¶
Data sharing among cluster users is conducted using Active Directory (AD) groups. Each AD group corresponds with a matching account in ScienceCluster.
If you would like to share data with other users in your group, you'll need the name of the account in ScienceCluster to which your username belongs (and you'll need to use this name to construct the appropriate command for sharing data with the other members of this group; see below for more details).
For example, if user asmith
would like to share the project1
directory with the matrix.uzh
group, the group ownership could be changed recursively.
$ chgrp -R S3IT_T_hpc_matrix.uzh /scratch/asmith/project1
$ ls -ld /scratch/asmith/project1
drwxrwx--- 1 asmith S3IT_T_hpc_matrix.uzh 1 May 26 12:26 /scratch/asmith/project1
$ ls -l /scratch/asmith/project1/
-rw-rw---- 1 asmith S3IT_T_hpc_matrix.uzh 0 May 26 12:26 data.txt.xz
Note: the group argument used in the command is S3IT_T_hpc_matrix.uzh
. A group titled esheep.uzh
would use S3IT_T_hpc_esheep.uzh
. In other words, add S3IT_T_hpc_
before the group/account name to construct the correct argument for the command.
To continue, in this example only the members of matrix.uzh
will be able to access project1
and only if they know the exact path. Alternatively, asmith
can choose to share his whole scratch directory in a read-only manner with the matrix.uzh
group.
$ chgrp -R S3IT_T_hpc_matrix.uzh /scratch/asmith
$ chmod -R g+rX /scratch/asmith