Blank white background with no objects or features visible.

TrueFoundry announces the acquisition of Seldon AI, expanding its Control Plane for Enterprise AI. Full press release โ†’

10 Ways to Reduce Gen AI Costs: Insights from the Gartnerยฎ Report

By Rhea Jain

Published: June 4, 2026

Generative AI has rapidly moved from experimentation to execution and is now embedded across products, operations, and customer experiences. However, as enterprises scale adoption, a structural issue is emerging: AI usage is growing faster than the mechanisms required to control cost. What begins as a contained pilot quickly expands into multiple teams building independently, applications invoking multiple models, and agentic workflows executing multi-step reasoning. The result is not just higher spend, but increasingly unpredictable and compounding costs across the organization.ย 

This challenge is highlighted in Gartner โ€œ10 Best Practices for Optimizing Generative and Agentic AI Costsโ€ , which examines how architectural decisions and lack of operational discipline drive cost overruns at scale. As the report notes, โ€œThrough 2028, at least 50% of GenAI projects will overrun their budgeted costs due to poor architectural choices and lack of operational know-how.โ€ This is not a tooling problemโ€”it is fundamentally an architectural and operating model failure.ย 

How we Believe Gartner Is Defining This Shift

This shift is explored in Gartner โ€œ10 Best Practices for Optimizing Generative and Agentic AI Costsโ€ , which focuses on how enterprises must rethink cost, governance, and operational control as AI systems move into production.ย 

TrueFoundry is mentioned in this report in the context of AI gatewaysโ€”an emerging control layer for managing cost, reliability, and governance across AI workloads.ย 

Read the full report here

Gartner highlights the scale of the challenge clearly: โ€œOrganizations transitioning from GenAI pilots to production experience a rude awakening when it comes to costs. Creating a production-ready GenAI system can be orders of magnitude more expensive than running a pilot.โ€ This marks the inflection pointโ€”AI cost becomes a runtime problem, not a build-time concern, driven by how systems are orchestrated, governed, and operated at scale.ย 

Gartner Hype Cycle for Platform Engineering 2026

Why Generative AI Costs Escalate in Productionย 

To understand the problem, it is important to break down how AI systems behave at scale.ย 

1 Inference Becomes the Dominant Cost Layerย 

Unlike traditional systems, AI incurs cost every time it is used.ย 

Gartner highlights this shift:ย 

โ€œThrough 2028, the aggregated costs of model inference will be at least 70% of the total model lifetime costsโ€ฆโ€ย 

This fundamentally changes how cost must be managed.ย 

2 Agentic Workflows Multiply Cost per Requestย 

Modern AI systems are not single-step.ย 

A single request can trigger:ย 

  • multiple model callsย 
  • tool interactionsย 
  • chained reasoningย 

This creates non-linear cost expansion.

3 Fragmented Adoption Drives Inefficiencyย 

In most enterprises:

  • teams adopt models independentlyย 
  • no shared governance existsย 
  • usage patterns are inconsistentย 

This leads to:ย 

  • duplicated usageย 
  • poor model selectionย 
  • unnecessary cost overheadย 

4 Lack of Runtime Governance Leads to Cost Sprawlย 

Without centralized control:ย 

  • no quotas are enforcedย 
  • no routing decisions are madeย 
  • no cost visibility existsย 

This is where cost becomes unmanageable at scale.ย 

The Architectural Shift: From Model Access to AI Control Planeย 

The recommendations in the Gartner point to a clear shift.

This is not about better models.ย 

It is about controlling how models are used in production.ย 

Key practices include:ย 

1 Centralized Access to AI Systemsย 

A single control layer to manage all model and tool interactions.ย 

2 Intelligent Model Routingย 

Selecting models dynamically based on cost, latency, and performance.ย 

3 Governance and Policy Enforcementย 

Applying quotas, limits, and guardrails across all usage.ย 

4 End-to-End Observabilityย 

Tracking usage, performance, and cost at a granular level.ย 

5 Cost Optimization Mechanismsย 

Reducing redundant inference through caching and reuse.ย 

Gartner formalizes this shift:ย 

โ€œA new category of tools called AI gateways can help control costs by enforcing policiesโ€ฆ and by providing features such as caching and model routing to reduce costs.โ€ย 

This defines a new layer:ย 

the AI control planeย 

A Gartnerยฎ infographic outlining 10 best practices for GenAI cost optimization, categorized into Robust Architecture, Efficient AI Operations, and Effective Change Management.

Where TrueFoundry Fitsย 

We believe that the direction Gartner outlines points to a clear requirement:ย 

a centralized control layer that governs how AI is used across the enterprise.ย 

TrueFoundry has been mentioned in this report as part of this emerging AI gateway ecosystem.ย 

TrueFoundry operates at the layer where AI usage occursโ€”and where cost is generated.ย 

1 From Reactive Tracking to Proactive Controlย 

Instead of:ย 

  • tracking cost after it happensย 

TrueFoundry enables:ย 

  • controlling usage before it scales

2 Dynamic Optimization at Runtimeย 

  • Route requests across models based on cost-performance trade-offsย 
  • Apply budgets, quotas, and rate limitsย 
  • Optimize usage through caching and reuseย 

3 Full Visibility Across AI Systemsย 

  • Token-level cost trackingย 
  • Request-level tracingย 
  • Team and application-level analyticsย 

4 Governance at Enterprise Scaleย 

  • Centralized access controlย 
  • Policy enforcement across all AI interactionsย 
  • Guardrails for safe and compliant usageย 

5 Enterprise-Ready Deploymentย 

  • Works across cloud and on-prem environmentsย 
  • Supports multi-model, multi-provider strategiesย 
  • Avoids vendor lock-inย 

This shifts the operating model from:ย 

โ€œWhat is our AI spend?โ€ย 

toย 

โ€œAre we using AI efficientlyโ€”and should this request even be executed?โ€ย 

Why This Matters for CXOsย 

Generative AI is entering its second phase.ย 

The first phase was about access.ย 

The next phase is about control and economics.ย 

At the same time, pricing models are evolving:ย 

โ€œBy 2030, at least 40% of enterprise SaaS spend will shift toward usage-, agent- or outcome-based pricing.โ€ This makes cost:ย 

  • a financial decisionย โ€
  • a governance problemย โ€
  • a strategic differentiatorย 

Organizations that introduce control at the runtime layer will:ย 

  • improve cost predictabilityย 
  • reduce unnecessary spendย 
  • scale AI systems responsiblyย 

Final Perspectiveย 

Gartner is defining generative AI cost as a systems-level challenge rooted in runtime behaviorโ€”not model selection. Because at scale:ย 

  • every request carries costย 
  • every workflow multiplies usageย 
  • every inefficiency compoundsย 

The enterprises that succeed will not be those that adopt AI faster.ย 

They will be the ones that introduce:ย 

control, governance, and economic discipline into how AI systems operate.ย 

The advantage will not come from access to modelsโ€”ย 

but from control over how those models are used.ย 

Explore Furtherย 

๏ฟฝ๏ฟฝ Read the full Gartner reportย 

๏ฟฝ๏ฟฝ Learn more about TrueFoundry: https://www.truefoundry.comย 

Disclaimer

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartnerโ€™s research organization and should not be construed as statements of fact.ย 

Gartner, 10 Best Practices for Optimizing Generative and Agentic AI Costs, By Arun Chandrasekaran et. al, 20 March 2026

GARTNER is a trademark of Gartner, Inc. and/or its affiliates.

The fastest way to build, govern and scale your AI

Sign Up
Table of Contents

One Gateway for Every LLM, Agent and MCP Server

Book a 30-min with our AI expert

Book a Demo

The fastest way to build, govern and scale your AI

Book Demo
Summarize with
ChatGPT logo by OpenAI
Perplexity AI logo
Blurry red snowflake on white background, symmetrical frosty design with soft edges and abstract shape.

Discover More

No items found.
llm observability platforms
July 3, 2026
|
5 min read

10 Best LLM Observability Tools in 2026

No items found.
July 4, 2026
|
5 min read

Schema-Driven Forms in React: Building with TrueFoundry FormBuilder

No items found.
July 2, 2026
|
5 min read

Pangea Integration with TrueFoundry's AI Gateway

No items found.
July 1, 2026
|
5 min read

Top 5 LiteLLM Alternatives in 2026

No items found.
No items found.

Recent Blogs

Black left pointing arrow symbol on white background, directional indicator.
Black left pointing arrow symbol on white background, directional indicator.

Frequently asked questions

How to optimize generative AI costs?

You can optimize generative AI costs by using the right model for each task and avoiding unnecessary usage. For example, simple tasks do not require large and expensive models, so choosing smaller ones can reduce spend. In addition, keeping prompts focused helps avoid extra token usage that does not add value. Similarly, limiting response length prevents paying for unnecessary output. Over time, regularly tracking usage makes it easier to identify where costs are increasing and take corrective action.

How to reduce LLM costs?

You can reduce LLM costs by cutting down on long prompts and repeated queries. Since longer inputs increase token usage, keeping them concise helps control costs. At the same time, repeated queries without caching can lead to avoidable spending. Using smaller models for basic tasks is another effective way to reduce costs without impacting performance. Overall, maintaining control over both input and output length ensures more efficient and predictable usage.

What is the role of AI gateway in optimizing costs?

An AI gateway helps optimize costs by controlling how different AI models are used. It routes requests to the most cost-effective model based on the task, so simple queries do not end up using expensive models. This prevents unnecessary spend and improves efficiency. With TrueFoundry, the AI gateway goes a step further by giving teams a unified layer to connect, observe, and govern AI usage across applications. It also provides clear visibility into token usage, enables smart routing, and helps enforce limits to keep spending under control.

Can I use generative AI for free?

Yes, you can use generative AI for free through limited plans offered by providers. These plans are useful for testing and small-scale usage. However, they come with restrictions on usage and features. Once usage increases, you will need to move to paid plans.

Why is generative AI so expensive?

Generative AI is expensive because it requires high computing power for every request. Large models run on costly infrastructure, which increases overall expenses. Costs also come from embeddings, integrations, and repeated workflows. This makes the total cost higher than just token usage.

What are the best practices for AI cost optimization?

The best practices for AI cost optimization include using the smallest effective model and reducing unnecessary usage. Keeping prompts clear and output limited helps control token usage. Monitoring usage regularly helps identify cost-heavy areas. Reducing repeated tasks and optimizing workflows also improves efficiency.

What affects LLM inference cost?

LLM inference cost is affected by model size, token usage, and request frequency. Larger models cost more because they require more computing power. Longer prompts and outputs increase token usage and cost. Frequent or multi-step requests can quickly increase overall expenses.

How does token usage impact AI costs?

Token usage impacts AI costs by determining how much you are charged per request. Every input and output is measured in tokens. Longer prompts and responses lead to higher costs. Managing token usage carefully helps keep overall spending under control.

What is the cost of running LLMs in production?

The cost of running LLMs in production includes token usage, infrastructure, and system-related expenses. You also need to account for storage, monitoring, and integrations. Token costs are often only a part of the total spend. As usage grows, these additional costs increase significantly.

What is agentic AI and how does it affect costs?

Agentic AI is a system where AI performs tasks through multiple steps and decisions. It affects costs by increasing the number of model calls required to complete a task. Each step adds to token usage and compute cost. This makes it more expensive than single-step AI interactions.

Take a quick product tour
Start Product Tour
Product Tour