GPUs on ScienceCloud¶
It is possible to launch an instance with GPU support.
To make use of a GPU device on your ScienceCloud instance, you need to select one of the "GPU enabled" flavors in the launch wizard.
"GPU enabled" flavors behave differently than the "normal" ones: please take a look at the section: GPU instances specific caveats.
GPU models available on the cloud¶
There are currently two GPU models that you can choose from:
-
NVIDIA Tesla P4 with 8 GB of onboard RAM.
-
NVIDIA Tesla T4 with 16 GB of onboard RAM
You can find more information, as well as the respective datasheets, on the NVIDIA website.
GPU enabled flavors¶
The "GPU enabled" flavors are public, i.e. any user can launch an instance with a GPU attached without the need for further interaction with Science IT support.
The naming of the flavors follows the usual scheme with the addition of the suffix "-gpu" followed by the model of the GPU device:
- -gpuP4 for the NVIDIA Tesla P4
- -gpuT4 for the NVIDIA Tesla T4
Available GPU flavors¶
Name | Hypervisor CPU | Accelerator |
---|---|---|
8cpu-64ram-hpcv2-lmem-gpuP4 16cpu-128ram-hpcv2-lmem-gpuP4 22cpu-176ram-hpcv2-lmem-gpuP4 | Xeon Gold 6126 | NVIDIA Tesla P4 |
8cpu-32ram-hpcv3-gpuT4 16cpu-64ram-hpcv3-gpuT4 32cpu-128ram-hpcv3-gpuT4 (on request) 64cpu-256ram-hpcv3-2gpuT4 (on request) | AMD EPYC 7702 | NVIDIA Tesla T4 |
Cost contribution¶
There is an additional cost for using a GPU enabled flavor as outlined in the "Service Description: ScienceCloud" document, reachable from the Science IT terms and conditions page (UZH login required).
The T4 and P4 models have the same cost.
You can estimate the total cost by adding to the "regular" flavor pricing the GPU cost specified in the pricing document.
Images with NVIDIA Driver and CUDA preinstalled¶
Science IT provides public images suitable for NVIDIA GPU specific usage.
They come with the Nvidia Driver and CUDA preinstalled and ready to use.
You can find the latest image version searching for a public image whose name starts with ***
and includes CUDA
, for example ***CUDA+Singularity on Ubuntu 20.04 (2024-05-29)
.
As for other public images, we regularly update the "CUDA" images with the latest packages and retire the oldest ones.
GPU instances specific caveats¶
Instances with GPU support behave differently from the regular ones. The principal differences are:
Supported actions¶
Danger
Pause, Suspend, Shelve and Resize actions are not supported with GPU flavors.
The actions above mentioned are not guaranteed to work.
Triggering one of those action on a GPU enabled instance might result in an unrecoverable Error state and should not be attempted.
If you resize an instance with no GPU support to a GPU enabled flavor, the resulting instance will lack GPU support even if running on a GPU enabled flavor.
If you need to adjust the size of your instance, or add GPU support to an existing instance, you need to shut down the instance, take a snapshot, and launch a new instance using the snapshot as the boot source.
Once you have checked everything works as expected you can then delete the old instance.
Maintenance¶
Migration and live-migration actions are not possible. This means that it is not possible for us to move your instance from one physical server to another when we need to perform maintenance on the underlying hardware.
When system maintenance needs to be performed on these servers, the instance must be shut down in the best case scenario, or even deleted in the worst case.
You are thus strongly advised to take regular backups of your work on a GPU enabled instance.
Availability¶
You can check the current availability status of GPU enabled flavors under the ScienceCloud flavor availability report.
Feedback is welcome and needed¶
Any kind of feedback regarding GPUs on ScienceCloud is very welcome. This service is new and we are looking for ways to improve it: your feedback will enable us to both optimize our service and to better meet your needs.