HPC with Azure CycleCloud

Azure CycleCloud is designed to support HPC in the cloud environment, specifically on Azure. It is tightly integrated with the vendor technologies. The controller is containerized, which can run anywhere under Docker.

The only supported cloud is Azure. Thus, it makes little sense to run the container anywhere other that Azure. Install Azure CLI as documented in Deployment of Kubernetes Cluster onto Various Clouds to get started.

Cost

CycleCloud seems very expensive. It is roughly £1 per hour by just idling a single 4-core master node of a Slurm cluster.

HPC on CycleCloud

CycleCloud provides an FQDN (e.g. cyclecloud.westeurope.azurecontainer.io named in the deployment) mapping to an external IP once deployed. As shown on our deployment https://cyclecloud.westeurope.azurecontainer.io/cloud/cluster_list, the following schedulers are supported natively:

  1. Slurm
  2. PBS
  3. HTCondor
  4. Grid Engine

The following file systems are supported natively:

  1. BeeGFS
  2. GlusterFS
  3. NFS

Most interesting to us is that both Docker and Singularity are supported natively by CycleCloud.

../_images/CycleCloudGUI.png

Configuration

Configure Azure CycleCloud by following the instructions on GUI. Tricky part is to provide the information of service principle so that an Azure subscription is available as a cloud provider.

CLI

The commandline interface is critical for any real workload. Download it from the GUI (e.g. https://<site_name>.<location>.azurecontainer.io/download/tools/cyclecloud-cli.zip). Unzip and run install.sh. /etc/paths may need to be updated to include /Users/davidyuan/bin in PATH.

Here are two tutorials to customize the Azure CycleCloud:

Subscription

HPC consumes significantly amount of resources. It is a good idea to create a separate subscription for each project to force the separation of resources and accounting. It also makes scripting a bit easier by allowing some parameters hard-coded.

az login reports a list of subscriptions and which one is the default. The same information can also be found via az account list. Create a new one via the portal. It is always a good idea to set the present working subscription as default:

az account set --subscription "<subscription_id>"

Service principle

At least one service principle is needed to allow CycleCloud to access Azure cloud resources in a subscription. It must be created at the subscription scope:

az ad sp create-for-rbac --scopes="/subscriptions/<subscription_id>"

Take note of the JSON response. The information is needed to create cloud provider account in CycleCloud GUI. It is quite hard to find it again via az ad sp list and application secret will be hidden.

Resource group

Use az account list-locations to find a valid location code for a subscription. Note that not all services are available in all locations.

Create a resource group to organize resources for CycleCloud:

az group create --name ${CIName} --location ${Location}

Vnet and subnet

The CycleCloud requires three subnets for production. They are needed to create HPC clusters in GUI.

  • cycle: The subnet in which the CycleCloud server is started in
  • compute: A /22 subnet for the HPC clusters
  • user: The subnet for creating user logins

For non-production, one subnet is enough:

az network vnet create --name ${CIName} --resource-group ${CIName} --address-prefix 10.0.0.0/16
az network vnet subnet create --resource-group ${CIName} --vnet-name ${CIName} --name compute --address-prefix 10.0.0.0/22

Container instance

The CycleCloud is packaged as RPM, DEB or container. The container does not support Kubernetes at present. This means that it can not be running on AKS but can be installed on Azure Container Instances:

az container create \
  --resource-group ${CIName} \
  --location ${Location} \
  --name ${CIName} \
  --dns-name-label ${CIName} \
  --image mcr.microsoft.com/hpc/azure-cyclecloud \
  --ip-address public \
  --ports 80 443 \
  --cpu 2 \
  --memory 4 \
  -e JAVA_HEAP_SIZE=2048 FQDN="${FQDN}"