Data Transfer with Globus¶
Globus is an open-source toolkit used for grid computing, and it offers a GridFTP implementation for the transfer of massive datasets. Globus can be used to transfer data at high rates between 2 Globus endpoints.
Globus endpoints¶
ScienceCluster¶
Science IT manages an official UZH Globus endpoint on the ScienceCluster, enabling you to transfer your data on its filesystems to (or from) any other Globus endpoint for which you have access.
Supercomputer (Daint and Eiger cluster on Alps at CSCS)¶
CSCS manages an official Globus endpoint on the Alps Supercomputer. Globus is the recommended method of data transfer to or from the Supercomputer.
Globus Connect Personal¶
Globus Connect Personal is an application that allows a user to turn their personal laptop or workstation into a fully functional Globus endpoint for sending/receiving data. Installation onto a ScienceCloud VM is also an option.
Installation instructions for Linux.
Transfer types¶
Globus transfers can be performed between official institutional endpoints (such as universities or supercomputing centers) free-of-charge. This requires that you have authentication credentials at both institutions (or more generally, at both Globus endpoints).
Transfer between a Globus institutional endpoint, to which you have access, and your own Globus personal endpoint is also possible without cost.
Note
Globus allows transfers free of charge when either the sender or recipient is using an officially maintained Globus endpoint. In other words, only a single Globus Connect Personal based endpoint can be used in a free transfer. Transferring between 2 Globus Connect Personal endpoints requires a Globus subscription. See here for more details.
Note
Globus can be used to transfer data directly to/from external collaborators only if both of you have access to a common Globus endpoint. If an external collaborator institution has a Globus subscription, they may be able to share data with you on their Globus endpoint.
We recommend to use Globus if you wish to transfer data between the ScienceCluster and the Supercomputer. More generally, we suggest to use Globus for transfers of large amounts of data where the source and destination both have a Globus endpoint.
Step-by-step guide¶
- Begin by going to https://www.globus.org/ and clicking the "Log In" button at the top of the page.
- At this point, you should be prompted to log in via your "existing organizational login". To find UZH, simply search for "Zurich".
- Click on Universität Zürich in the drop down, then click Continue to be forwarded to the SWITCH edu-ID authentication.
- After logging in, click "File Manager" on the menu on the left. If you wish to select the ScienceCluster endpoint, search for "S3IT" in the Collection search bar. When doing so, you should see the "S3IT UZH globus endpoint" appear in the suggestions.
- Click on the "Continue" button when prompted with "Please authenticate to access" the selected Globus endpoint.
- You will then be redirected to another login screen. For ScienceCluster, click on the integer username, as shown in the example screenshot. At the next screen, click "Allow".
- At this point, you should see your files on the ScienceCluster or other endpoint system appear in the File Manager area. In order to transfer to (or from) another Globus endpoint, click the 2nd panel option on the top right part of the page and search for the other endpoint. When prompted, you will need to enter your credentials for the institution of that other endpoint.
- Ensure that the other endpoint loads correctly in the right panel (i.e. you can navigate to view files or directories). Filesystems for both endpoints should now be visible in the dual panel view. Contact the endpoint host institution if you have trouble finding or loading their endpoint.
- Specify any of the "Transfer & Sync Options" that you require, click through and select all files/directories that you'd like to transfer. Select the destination directory on the other endpoint. Then press "Start" to begin your transfer. Files/directories can be transferred in either direction between the 2 endpoints.
Note
We highly recommend testing the transfer of a small dataset or directory of files before performing a large transfer.
If you have your files on a ScienceCloud VM, a ScienceCloud volume, or elsewhere on UZH technical infrastructure, consider setting up Globus Connect Personal in order to take advantage of Globus.
Be mindful of your quota on ScienceCluster: If the total size of the files that you want to transfer to the ScienceCluster is too large for the /data
and /home
filesystems, consider transferring to scalable storage or to the temporary /scratch
filesystem.
Once a transfer is started, you can monitor progress, and you will receive an email notification when the transfer is complete (unless you have disabled that option).