.. |br| raw:: html
.. _prepare-gcp-environment: ########################################## Quickstart I: Prepare Your GCP Environment ########################################## GCHP simulations are highly computationally intensive and demand robust infrastructure. To achieve the multi-core processing and fast networking speeds required for efficient runs, we strongly recommend deploying GCHP on a Slurm-managed HPC cluster built with Google's `Cluster Toolkit `_. This quickstart guides you through the necessary steps from the beginning to prepare your Google Cloud Platform (GCP) environment before configuring :ref:`Cluster Toolkit ` for GCHP. None of the steps below incur compute charges by themselves, but several of them (especially quota increases) can take **24-48 hours**, so do them well before you plan to run jobs. .. warning:: Quota increases for H4D and large CPU pools can take 24-48 hours to be approved by Google Cloud Support. If your cluster fails to spin up compute nodes and the logs show a ``ZONE_RESOURCE_POOL_EXHAUSTED`` or ``QUOTA_EXCEEDED`` error, you need to request more quota in **IAM & Admin -> Quotas & System Limits** in the Google Cloud Console. ================================================================================ 1. Create a GCP Project and Link Billing ================================================================================ A :ref:`Project ` is the top-level container for all GCP resources. Every VM, disk, and IP address belongs to exactly one project, and billing is tracked per project. 1. Sign in at https://console.cloud.google.com. 2. Open the project selector (top bar) -> **NEW PROJECT**. 3. Give the project a name like ``gchp-prod`` and note the assigned **Project ID** (e.g., ``gchp-prod-414000``). The Project ID, not the project name, is what every ``gcloud`` command uses. 4. Open **Billing** in the left navigation. Link an active billing account to the new project. New accounts receive a $300 free credit valid for 90 days, which is useful for the initial cluster bring-up. .. tip:: To watch your spend in real time, enable **Billing export to BigQuery** under *Billing -> Billing export*. After approximately 24 hours of data, you can query exact daily spend per SKU with SQL. See :ref:`set-up-gcp-cluster` for the example query. ================================================================================ 2. IAM Permissions for Cluster Creation ================================================================================ Cluster Toolkit uses `Terraform `_ to provision resources on your behalf, including VPCs, Compute Engine instances, Filestore volumes, and :ref:`IAM ` service accounts. The user who runs ``gcluster create`` must have permission to create all of those. The simplest setup is to give your user the **Owner** role on the project during initial bring-up, then narrow to per-resource roles afterward. For a least-privilege configuration, the user needs the following IAM roles: .. list-table:: :header-rows: 1 :widths: 35 65 * - Role - Purpose * - ``roles/compute.admin`` - Create VMs, disks, images, networks, subnets * - ``roles/file.editor`` - Create and manage Filestore NFS volumes * - ``roles/iam.serviceAccountAdmin`` - Create the cluster's service account * - ``roles/iam.serviceAccountUser`` - Attach service accounts to VMs * - ``roles/resourcemanager.projectIamAdmin`` - Bind service accounts to roles * - ``roles/storage.admin`` - Read/write to Cloud Storage if used Apply them with: .. code-block:: bash PROJECT_ID="" USER_EMAIL="" for ROLE in compute.admin file.editor iam.serviceAccountAdmin \ iam.serviceAccountUser resourcemanager.projectIamAdmin \ storage.admin; do gcloud projects add-iam-policy-binding "$PROJECT_ID" \ --member="user:$USER_EMAIL" \ --role="roles/$ROLE" done .. warning:: The issue of missing permissions could be discovered in a later stage. If you encounter an error such as ``Permission 'compute.networks.create' denied`` during cluster creation, your IAM user lacks the required authority. Re-run the binding above, or temporarily switch to the **Owner** role until you finish the initial setup. ================================================================================ 3. Enable Required APIs ================================================================================ On a fresh project, most GCP APIs are disabled by default. Enable the ones GCHP needs: .. code-block:: bash gcloud config set project "$PROJECT_ID" for API in compute.googleapis.com \ file.googleapis.com \ cloudresourcemanager.googleapis.com \ servicenetworking.googleapis.com \ storage.googleapis.com \ cloudbilling.googleapis.com \ deploymentmanager.googleapis.com \ iam.googleapis.com; do gcloud services enable "$API" done Enabling each API takes about 30 seconds. Cluster Toolkit will refuse to create the cluster until all of these are active. ================================================================================ 4. GCP Service Quotas (vCPU, H4D, Filestore) ================================================================================ Fresh GCP projects have very low default quotas, usually not enough to run GCHP at production scale. Request increases at least one week before you plan to run. Open **IAM & Admin -> Quotas & System Limits** in the Cloud Console, filter by quota name, and request the values below for your target region (the rest of this guide uses ``us-central1``): .. list-table:: :header-rows: 1 :widths: 32 14 35 19 * - Quota name - Region - Why - Suggested * - ``CPUs (all regions)`` - us-central1 - Total cluster cores - >= 500 * - ``C2 CPUs`` - us-central1 - c2-standard-60 burst capacity - >= 240 * - ``H4D CPUs`` - us-central1 - h4d-standard-192 (1 node = 192 cores) - >= 384 (2 nodes) * - ``In-use IP addresses`` - us-central1 - NAT + RDMA per-node IPs - >= 16 * - ``Persistent Disk SSD (GB)`` - us-central1 - Controller + login boot disks - >= 500 * - ``Filestore Basic HDD storage (TB)`` - us-central1 - NFS for ``/shared`` - >= 2 .. note:: H4D and Falcon RDMA capacity is offered only in specific zones: ``asia-southeast1-a``, ``europe-west4-b``, ``us-central1-a``, ``us-central1-b``, and ``us-west4-a`` as of 2026-06. Verify available zones with the following command before requesting H4D quota: .. code-block:: bash gcloud compute machine-types list --filter="name=h4d-standard-192" \ --format="value(zone)" ================================================================================ 5. Install gcloud, Terraform, and Cluster Toolkit ================================================================================ Install the three tools on your **local workstation** (not on a cloud VM): 5a. Install gcloud CLI -------------------------------------------------------------------------------- .. code-block:: bash curl https://sdk.cloud.google.com | bash exec -l $SHELL gcloud init # picks the project + login gcloud auth application-default login # for Terraform 5b. Install Terraform 1.7+ -------------------------------------------------------------------------------- .. code-block:: bash brew install terraform # macOS # or: apt-get install terraform # Debian/Ubuntu 5c. Install Cluster Toolkit -------------------------------------------------------------------------------- .. code-block:: bash git clone https://github.com/GoogleCloudPlatform/cluster-toolkit.git \ ~/cluster-toolkit cd ~/cluster-toolkit make Verify all three: .. code-block:: bash $ gcloud --version Google Cloud SDK 530.0.0 ... $ terraform --version Terraform v1.10.5 $ ~/cluster-toolkit/gcluster --version gcluster v1.55.0 ================================================================================ 6. Cost Sanity Check Before You Continue ================================================================================ The cluster you build in :ref:`set-up-gcp-cluster` has these always-on baseline costs **before any GCHP job runs**: .. list-table:: :header-rows: 1 :widths: 70 30 * - Resource - Daily cost * - Filestore BASIC_HDD 1 TB - $6.67 * - Controller VM (c2-standard-4) - $5.03 * - Login VM (n2-standard-2) - $2.33 * - Disks + IPs - ~$2.50 * - **Idle baseline** - **~$18/day** Each H4D node costs approximately **$10/hour** while running and **$0/hour** when stopped (boot disks at ~$0.07/day remain). Plan to stop the controller, login, and any compute nodes when you finish work for the day. See the cost-control section of :ref:`set-up-gcp-cluster` for the exact commands. ================================================================================ You are now ready ================================================================================ When all of the following are in place: * Project linked to billing * IAM roles granted * APIs enabled * Quotas approved * ``gcloud``, ``terraform``, and ``gcluster`` installed locally proceed to :ref:`set-up-gcp-cluster` to build the cluster.