Blank white background with no objects or features visible.

NEW RESEARCH: 80% of AI costs are invisible at billing. 200+ leaders reveal where the money goes. Read →

Multi-Cloud GPU Orchestration with TrueFoundry: A Reference Architecture for Hyperscalers and Specialized Clouds

By Boyu Wang

Updated: May 13, 2026

GPU capacity is one of the hardest constraints for AI teams right now. Provisioning Amazon EC2 P5 instances or Azure ND H100 v5-series VMs can run into quota limits, regional capacity constraints, or commercial commitments that are difficult to absorb. That has made specialized GPU clouds — CoreWeave, Lambda, Fluidstack, and several others — viable production targets, not just overflow capacity. Each of these providers offers a managed Kubernetes path for GPU workloads: CoreWeave Kubernetes Service (CKS), Lambda Managed Kubernetes (MK8s) on a 1-Click Cluster, and Fluidstack managed Kubernetes.

Running across these alongside a hyperscaler footprint creates real operational complexity: separate dashboards, separate identity systems, separate observability, separate deployment flows. TrueFoundry's role here is to attach each of these Kubernetes clusters to a single Control Plane, present them as deploy targets in one UI, and provide a consistent K8s operational layer on top — without replacing the cluster-native automation each provider already ships.

This post walks through what that actually looks like: the architecture, what attaching a cluster requires, where the platform's automation ends and the provider's begins, and the practical patterns we recommend.

The Architecture: One Control Plane, Many Compute Planes

TrueFoundry uses a split-plane architecture. The Control Plane (TrueFoundry-managed or self-hosted) holds metadata, RBAC, deployment manifests, and the UI. The Compute Plane is your own Kubernetes cluster. Multiple Compute Planes can connect to a single Control Plane — meaning an EKS cluster, an AKS cluster, a CKS cluster, an MK8s cluster, and a Fluidstack managed K8s cluster can all report to the same dashboard.

The tfy-agent runs in each cluster and opens a secure outbound WebSocket to the Control Plane. The agent streams cluster state, while the agent proxy lets the Control Plane apply Kubernetes changes through that outbound connection without requiring an inbound endpoint on the cluster. Workloads, data-plane traffic, and provider credentials remain inside each cluster's cloud account, and traffic does not flow between Compute Planes through the Control Plane — they remain independent clouds with independent identities.

Figure 1. Multi-cloud architecture. One Control Plane, one outbound agent connection per cluster, and no data-plane traffic between clusters through the platform. Each Compute Plane keeps its own identity boundary, storage, and provider-managed components.

What "Attaching a Cluster" Actually Requires

The agent install itself is a single helm command — but it sits at the end of a setup that has real prerequisites. For any cluster (hyperscaler or specialized) to attach cleanly, the cluster needs:

  • Kubernetes 1.28+ with headroom for roughly 250 nodes / 4,096 pods, depending on the intended workload profile
  • Outbound egress to the container registries TrueFoundry pulls from: public.ecr.aws, quay.io, ghcr.io, tfy.jfrog.io, docker.io/natsio, nvcr.io, registry.k8s.io
  • A wildcard domain (e.g. *.lambda-pool.example.com) and a TLS certificate — cert-manager + Let's Encrypt is the documented pattern
  • A working load balancer or ingress path. Managed hyperscale K8s usually provides this through the cloud load balancer integration; on bare-metal or specialized clusters, confirm the provider-supported ingress and IP allocation path
  • Persistent storage support for volumes and artifacts, typically through the provider's CSI-backed block, filesystem, or object-storage integrations
  • A reachable container registry and artifact store for image pulls, build outputs, and workflow artifacts
  • Node labels on generic / specialized clusters: truefoundry.com/nodepool=<pool-name> on every node, and truefoundry.com/gpu_type=<GPU_TYPE> on GPU nodes (TrueFoundry auto-discovers node pools on EKS/GKE/AKS only)

On managed hyperscale K8s, TrueFoundry's OpenTofu/Terraform modules cover most of this. On specialized clouds, you use the provider's managed K8s offering directly — provision through their console, prepare the prerequisites, then attach. The exact agent install command is generated by the platform UI when you click Attach Existing Cluster; it typically follows this shape:

helm repo add truefoundry https://truefoundry.github.io/infra-charts/
helm upgrade --install tfy-agent truefoundry/tfy-agent \
  --set tenantName=my-org \
  --set clusterName=lambda-h100-pool \
  --set controlPlaneURL=https://<YOUR_CONTROL_PLANE> \
  --set clusterTokenSecret=<YOUR_CLUSTER_TOKEN_SECRET>

Specialized Cloud Specifics: the Addon-Overlap Problem

This is the part many architecture posts gloss over. The specialized clouds named earlier already provide Kubernetes components that overlap with parts of TrueFoundry's default addon stack. If both sides install the same component, you can get conflicts that range from duplicated dashboards to unsupported GPU operator deployments.

CoreWeave is explicit about this: CoreWeave manages the NVIDIA GPU Operator on CKS clusters and warns against double-installing it. The platform-managed deployment is the only supported one. Disable TrueFoundry's GPU Operator addon when attaching a CKS cluster.

Concretely:

  • CoreWeave CKS includes a CoreWeave-managed NVIDIA GPU Operator on recent clusters, Cilium networking, storage integrations, DPU-based infrastructure, and CoreWeave observability. When attaching, disable TrueFoundry's GPU Operator addon and review any observability overlap.
  • Lambda MK8s provides GPU and InfiniBand/RDMA support, shared persistent storage through the lambda-shared StorageClass, NVIDIA DCGM Grafana dashboards, and automated node remediation. Disable TrueFoundry's GPU Operator addon if Lambda is already managing GPU enablement. The provider's DCGM dashboard is separate from TrueFoundry's observability and can run alongside it.
  • Fluidstack managed Kubernetes advertises support for GPU Operator and Network Operator, Ray, Volcano, and Kueue for batch scheduling, Atlas-managed storage, and cluster-health observability. Disable TrueFoundry's GPU Operator addon when the provider is already managing it. The provider's batch scheduling stack is complementary to, rather than a replacement for, workflow orchestration.

The Attach Existing Cluster form has a Cluster Addons section where you toggle off any addon the provider already supplies. This is a one-time decision per cluster.

What TrueFoundry Adds Across Clusters

Once a cluster is attached, the platform layers a consistent operational experience on top of whatever K8s the provider gave you:

  • One UI for every cluster. Every deployment, every job, every service, every workspace — visible across all attached clusters in the same dashboard.
  • Consistent deployment manifest format. Author a service or job once; target a different cluster by changing the cluster_name field — provided the prerequisites on the destination cluster match (matching GPU type, registry access, secrets, storage class).
  • GitOps-versioned delivery via ArgoCD deployed into each cluster, with deployment configuration stored in Git when GitOps is enabled.
  • Per-cluster observability, with Prometheus-based metrics surfaced in the Control Plane UI and optional Grafana for deeper cluster-level dashboards. (Note: this is consolidated operational visibility, not a replacement for a federated long-term metrics backend.)
  • Argo Workflows in each cluster for batch jobs and training runs, with run history and step-level observability surfaced uniformly.
  • Autoscaling for services within each cluster, including request-rate-based scaling, time-based rules, queue-based patterns, and scale-to-zero for suitable workloads.
  • Within-cluster capacity-type placement: workloads can target spot, on-demand, or spot-with-on-demand-fallback capacity where the underlying cloud and node provisioning setup supports it.
  • Workspace-based RBAC and SSO at the platform layer, with consistent permission models regardless of which cluster a workload runs in.

What TrueFoundry Does Not Do (Yet)

To be fully clear about scope — these are real capabilities customers ask for, but they are not platform features today:

  • Cross-cluster scheduling. When you deploy a job, you target a specific cluster. The platform does not automatically pick the cheapest cluster, route based on real-time capacity, or rebalance running workloads across clusters.
  • Cross-cluster failover. If a Lambda 1CC has hardware issues or a CoreWeave region runs out of capacity, the platform does not automatically retry the job on an EKS reservation. Within-cluster placement and fallback policies can help when the underlying cluster supports them; cross-cluster failover is a different problem and is not shipped.
  • Aggregated GPU capacity pooling. Your H100 capacity on CoreWeave, Lambda, and AWS are tracked as three separate clusters — not as one pooled quota. The UI shows all three; the scheduler treats them as independent.
  • Cost-aware automatic routing. Choosing where to run a job based on real-time provider pricing isn't a platform feature today.

If your use case genuinely requires cross-cluster orchestration — for example, a training job that should burst to the cheapest available cloud — build that decision at the orchestration layer (your CI/CD logic, a small custom scheduler, or a workflow tool like Argo Workflows or Temporal) on top of TrueFoundry's per-cluster deploy API. The platform gives you the consistent deploy primitives across clusters; the routing decision stays with you.

A Real Workflow: Attaching a Lambda 1-Click Cluster

Concrete sequence so the expectation is set correctly:

  1. Provision a 1-Click Cluster on Lambda with Managed Kubernetes enabled. Choose the size (16 to 2,000+ GPUs) and the reservation length.
  2. Get cluster admin access via Lambda's authentication flow and verify kubectl get nodes shows your GPU workers as Ready.
  3. Prepare prerequisites in the cluster: configure the wildcard DNS and ingress path, install or verify TLS certificate automation, confirm the lambda-shared StorageClass is present if you need shared storage, and verify egress to TrueFoundry's required container registries.
  4. In TrueFoundry, click "Attach Existing Cluster" and fill in the cluster details. Disable the GPU Operator addon when Lambda is already providing GPU enablement.
  5. Run the generated helm command in your cluster. Wait for the agent and any selected addons to come up — typically 5–10 minutes after the prerequisites are in place.
  6. Configure tolerations in the TrueFoundry workspace targeting this cluster. MK8s nodes have GPU taints that need to be tolerated; the workspace-level toleration config applies them to every job submitted to that cluster.
  7. Verify the cluster shows as connected, then deploy a small test job (a single-GPU vLLM service or a one-node training run) before moving production workloads.

Total time on a prepared cluster is typically 30–60 minutes, dominated by DNS propagation, certificate issuance, and the helm install.

Practical Comparison

Capability Hyperscalers (EKS / AKS / GKE) Specialized (CKS / MK8s / Fluidstack) What TrueFoundry adds
GPU availability Capacity quotas, reservation planning, and regional availability constraints Bare-metal H100 / H200 / B200 / GB200 capacity, subject to provider availability and commercial terms All attached clusters visible in one UI; deploy targets selectable per workload
Pricing Published rates, reserved discounts Provider-specific pricing, often optimized for large GPU clusters Per-cluster deployment; cross-cluster cost routing is not platform-native
Spot / preemption Cloud-native spot, on-demand, and reservation constructs; fallback depends on node provisioning setup Provider-managed (varies) Configured per cluster; cross-cluster failover is user-side
Storage Native EBS / EFS, Persistent Disk / Filestore, Azure Disk / Azure Files, plus object-store integrations where configured Provider-managed storage integrations such as CoreWeave storage, Lambda shared filesystem, and Fluidstack Atlas-managed storage Use the provider-supported Kubernetes storage abstraction; validate access modes and performance per workload
GPU drivers / Operator Installed by TrueFoundry's GPU Operator addon when the cluster does not already provide it Often provider-managed — disable TrueFoundry's addon when the provider already manages GPU enablement Per-cluster addon toggle in the attach form
Observability Per-cluster Prometheus + Grafana surfaced in platform UI TrueFoundry observability plus provider-native dashboards where available, such as Lambda's DCGM Grafana Consolidated cluster visibility (not federated long-term metrics)
Identity EKS IRSA / GKE Workload Identity / Azure WI configured per provider's standard Provider-specific RBAC and IAM Workspace-based RBAC and SSO at the platform layer; native K8s RBAC stays per-cluster

Conclusion

Multi-cloud GPU strategy is real, and increasingly practical as compute capacity becomes a constraint on AI roadmaps. The pragmatic path is treating each cluster — whether EKS, AKS, GKE, CKS, MK8s, or Fluidstack managed Kubernetes — as a standard Kubernetes attachment, with the platform providing a consistent operational layer on top.

What you get: a single UI across every cluster you've attached, consistent deployment and GitOps patterns regardless of cloud, workload portability at the manifest level (where prerequisites match), and complete data separation across customer cloud accounts. What you don't get today: automatic cross-cluster scheduling, capacity pooling, or cost-aware routing — those decisions stay with you, built on top of the platform's per-cluster deploy API.

If you're evaluating this stack, the natural starting point is attaching two clusters — one hyperscaler and one specialized — and running a representative job on each to validate the operational ergonomics before scaling out.

The fastest way to build, govern and scale your AI

Sign Up
Table of Contents

Govern, Deploy and Trace AI in Your Own Infrastructure

Book a 30-min with our AI expert

Book a Demo

The fastest way to build, govern and scale your AI

Book Demo

Discover More

No items found.
May 21, 2026
|
5 min read

Gemini 3.5 Flash: When the Fast Model Becomes the Frontier Model

LLMs & GenAI
Types of AI agents governed by TrueFoundry enterprise control plane
May 20, 2026
|
5 min read

Types of AI Agents: Definitions, Roles, and What They Mean for Enterprise Deployment

No items found.
Comparing AI agents and agentic AI workloads in enterprise production
May 20, 2026
|
5 min read

AI Agents vs Agentic AI: What the Difference Actually Means in Production

No items found.
May 20, 2026
|
5 min read

Agent Gateway Series (Part 4 of 7) | FinOps for Autonomous Systems

No items found.
No items found.

Recent Blogs

Black left pointing arrow symbol on white background, directional indicator.
Black left pointing arrow symbol on white background, directional indicator.
Take a quick product tour
Start Product Tour
Product Tour