Fluid-Slurm-GCP Cluster Configuration

Type: object

A schema to describe cloud-native HPC clusters on Google Cloud Platform.

Type: string Default: "projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-compute-centos-7-centos-v2-3-2"

Built in or custom image for the compute nodes. See https://cloud.google.com/compute/docs/images

Must match regular expression: projects/[a-z]([-a-z0-9]*[a-z0-9])/global/images/[a-z]([-a-z0-9]*[a-z0-9])

Type: string Default: "default"

Service account e-mail address for the compute instances.

Type: string Default: "projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-controler-centos-7-centos-v2-3-2"

Built in or custom image for the controller node. See https://cloud.google.com/compute/docs/images

Must match regular expression: projects/[a-z]([-a-z0-9]*[a-z0-9])/global/images/[a-z]([-a-z0-9]*[a-z0-9])

Type: object

Instance that hosts the Slurm job scheduler

Type: integer

Boot disk size for the controller.

Value must be greater or equal to 15 and lesser or equal to 62000

Type: enum (of string)

Boot disk type for the controller. See https://cloud.google.com/compute/docs/disks

Must be one of:

  • "pd-standard"
  • "pd-ssd"

Type: string

Valid GCP Machine Type. See https://cloud.google.com/compute/docs/machine-types

Type: string

GCP project ID to host the controller.

Must match regular expression: [a-z]([-a-z0-9]*[a-z0-9])

Type: string

Valid VPC subnet that hosts the controller. Format is https://www.googleapis.com/compute/v1/projects/{vpc-project-id}/regions/{region}/subnetworks/{subnetwork}

Type: string

Valid GCP Zone to host the controller. See https://cloud.google.com/compute/docs/regions-zones

Type: string Default: "default"

Service account e-mail address for the controller instances.

Type: string Default: "default"

Name of the default partition.

Type: string Default: "projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-login-centos-7-centos-v2-3-0"

Built in or custom image for the login node. See https://cloud.google.com/compute/docs/images

Must match regular expression: projects/[a-z]([-a-z0-9]*[a-z0-9])/global/images/[a-z]([-a-z0-9]*[a-z0-9])

Type: array of object

Login node for the HPC cluster.

Must contain a minimum of 1 items

Each item of this array must be:

Type: object

Type: integer

Boot disk size for the login node.

Value must be greater or equal to 15 and lesser or equal to 62000

Type: enum (of string)

Boot disk type for the login node. See https://cloud.google.com/compute/docs/disks

Must be one of:

  • "pd-standard"
  • "pd-ssd"

Type: string

Valid GCP Machine Type. See https://cloud.google.com/compute/docs/machine-types

Type: string

GCP project ID to host the login node.

Must match regular expression: [a-z]([-a-z0-9]*[a-z0-9])

Type: string

Valid VPC subnet that hosts the login node. Format is https://www.googleapis.com/compute/v1/projects/{vpc-project-id}/regions/{region}/subnetworks/{subnetwork}

Type: string

Valid GCP Zone to host the login node. See https://cloud.google.com/compute/docs/regions-zones

Type: string Default: "default"

Service account e-mail address for the login instances.

Type: object

Configuration settings for GSuite SMTP Relay to enable mail notification.

Type: string

Gmail account with your Gsuite domain that will send email through the SMTP Relay. This is only needed if SMTP Authentication is required.

Type: string

Verified domain in your G Suite organization that hosts emails will be routed through. This domain must be asssociated with an A Record

Type: boolean

A flag to indicate if smtp_authentication is required. If set to true, an email address is required. If set to false, a domain is required.

Type: array of object

An array of Slurm compute partitions.

The following properties are required:

  • machines
  • name
  • project

Must contain a minimum of 1 items

Each item of this array must be:

Type: object

Type: array of object

Must contain a minimum of 1 items

Each item of this array must be:

Type: object

A list of machine types to include in this partition. Defaults to 10 ephemeral standard (non-preemptible) n1-standard-4 instances with 10GB boot disk, no GPUs, and no local SSDs.

Type: boolean

Flag to disable(true)/enable(false) hyperthreading. See https://github.com/WyattGorman/Manage_Hyperthreading.sh

Type: integer Default: 15

Boot disk size for each compute node in the machine block.

Value must be greater or equal to 15 and lesser or equal to 62000

Type: enum (of string) Default: "pd-standard"

Boot disk type for the login node. See https://cloud.google.com/compute/docs/disks

Must be one of:

  • "pd-standard"
  • "pd-ssd"

Type: boolean Default: false

Flag to disable/enable external IP addresses on compute instances in the machine block. See https://cloud.google.com/compute/docs/ip-addresses

Type: enum (of integer) Default: 0

The number of GPU accelerators to attach to each compute instance in the machine block. See https://cloud.google.com/compute/docs/gpus

Must be one of:

  • 0
  • 1
  • 2
  • 4
  • 8

Type: enum (of string) Default: ""

GPU accelerator type to attach to compute instances in the machine block. Only certain zones support GPUs. See https://cloud.google.com/compute/docs/gpus

Must be one of:

  • ""
  • "nvidia-tesla-k80"
  • "nvidia-tesla-p100"
  • "nvidia-tesla-v100"
  • "nvidia-tesla-t4"
  • "nvidia-tesla-p4"

Type: string Default: "projects/fluid-cluster-ops/global/images/fluid-slurm-gcp-compute-centos-7-centos-v2-3-2"

Built in or custom image for the compute nodes. Over-rides compute_image for this machine block. See https://cloud.google.com/compute/docs/images

Must match regular expression: projects/[a-z]([-a-z0-9]*[a-z0-9])/global/images/[a-z]([-a-z0-9]*[a-z0-9])

Type: string Default: "/scratch"

The directory to mount local SSDs on each compute instance in the machine block (if nlocalssds > 0).

Type: string Default: "n1-standard-4"

Valid GCP Machine Type. See https://cloud.google.com/compute/docs/machine-types

Type: integer Default: 10

The maximum number of machines in the machine block.

Value must be greater or equal to 1

Type: integer Default: 0

The number of local SSDs to attach to each instance in the machine block. See https://cloud.google.com/compute/docs/diskslocalssds

Value must be greater or equal to 0 and lesser or equal to 8

Type: string

The name prefix for machines in the machine block.

Must match regular expression: ^((?!_)[a-z]([-a-z0-9]*[a-z0-9]))+$

Type: boolean Default: false

Flag to disable/enable preemptible compute instances in the machine block. See https://cloud.google.com/compute/docs/instances/preemptible

Type: integer Default: 0

The number of static compute instances in the machine block. Must be less than or equal to maxnodecount.

Value must be greater or equal to 0

Type: string

Valid VPC subnet that hosts the login node. Format is https://www.googleapis.com/compute/v1/projects/{vpc-project-id}/regions/{region}/subnetworks/{subnetwork}

Type: string

Valid GCP Zone to host the login node. See https://cloud.google.com/compute/docs/regions-zones

Type: string Default: "UNLIMITED"

Maximum run time for jobs in the compute partition. See https://slurm.schedmd.com/slurm.conf.html

Type: string

Name of the partition (passed to the --partition flag for salloc, sbatch, and srun).

Must match regular expression: ^((?!_)[a-z]([-a-z0-9]*[a-z0-9]))+$

Type: string

GCP project ID to host compute instances in the compute partition

Must match regular expression: ^((?!_)[a-z]([-a-z0-9]*[a-z0-9]))+$

Type: array of object

An array Slurm account profiles. See https://slurm.schedmd.com/accounting.html

Each item of this array must be:

Type: object

Type: array of string

Must contain a minimum of 1 items

Each item of this array must be:

Type: string

Type: string

Slurm account name.

Must match regular expression: ^((?!_)[a-z]([-a-z0-9]*[a-z0-9]))+$

Type: array of string

POSIX username to align with parent Slurm account and compute partitions.

Must contain a minimum of 1 items

Each item of this array must be:

Type: string

Type: array of object

An array of NFS, Lustre, and GCS resources to mount to all instances in the cluster.

Each item of this array must be:

Type: object

Type: string

POSIX group name or gid for group permissions on the mount directory.

Type: string

Location on the cluster to mount the remote filesystem.

Type: string

Mount options. Defaults to rw,hard,intr .

Type: string

POSIX user name or uid for user permissions on the mount directory.

Type: string

3-digit POSIX permissions to apply to mount directory.

Type: enum (of string)

Mount protocol to host the remote storage

Must be one of:

  • "nfs"
  • "lustre"
  • "gcsfuse"

Type: string

Full path to the remote server directory to mount. For example, example.edu:/network_storage

Type: string

Munge authentication key to use for Slurm communication authentication.

Type: string

Name of the cluster. Prefixes the controller and login instance names.

Must match regular expression: [a-z]([-a-z0-9]*[a-z0-9])

Type: object

Configuration settings for the Slurm Database.

Type: string

The name of the CloudSQL instance.

Type: string

The internal IP address of the CloudSQL instance on the same VPC network.

Type: integer Default: 6819

The port used for communication with the Slurm database

Type: integer

The amount of time (in seconds) to wait before powering down ephemeral compute instances.

Value must be greater or equal to 0

Type: array of string

Network tags to apply to all instances in the cluster.

Each item of this array must be:

Type: string