Baseten
AI inference platform for production model serving — deploy and run open-source, fine-tuned and custom AI models with low-latency, high-throughput infrastructure across multi-cloud and on-prem; positioned as the inference layer for the application generation of AI built on a multi-model future.
The Business
Baseten builds an AI inference platform for production model serving — deploying, running and scaling open-source, fine-tuned and custom AI models with low-latency, high-throughput infrastructure across multi-cloud and on-prem environments. The product line is anchored on the Baseten Inference Stack (model deployment, autoscaling, observability and the multi-model orchestration surface), the Truss open-source framework for packaging models for production, and a deployment surface that supports open-source model catalogues (Meta Llama, Mistral, DeepSeek), customer fine-tunes and bespoke custom models. The company is privately held — founded 2019 in San Francisco by Tuhin Srivastava, Amir Haghighat, Philip Howes and Pankaj Gupta — and has raised approximately $590M+ of external capital through the January 2026 $300M Series E at a $5B valuation, led by IVP and CapitalG with NVIDIA participating as a $150M anchor. The Information reported in late May 2026 that the company is in talks to raise approximately $1B at an $11B valuation; that round had not closed at time of writing.
Customers and Distribution
Baseten’s annualised revenue ramped from approximately $200M in December 2025 to approximately $600M in March 2026 per CEO podcast interviews and corroborating Tech Startups coverage — described in industry coverage as among the steepest inference-platform ramps on record. Customer disclosures across the Series D and Series E cycle include Descript, Patreon, Writer and a multi-category base spanning developer-led adoption and enterprise procurement. Distribution sits across three motions: direct developer onboarding via the Truss open-source framework and the Baseten self-serve platform, direct enterprise sales for production workloads, and partner-and-platform alignment with NVIDIA following the Series E anchor investment. The company has not separately disclosed precise paid-customer count, headcount, or gross-margin shape in primary sources; we rely on the Series D and Series E blog posts and named-press triangulation for the cited figures.
Model Strategy
Baseten is a Verticals-first play under the IM Framework eight-trajectories taxonomy as it applies to AI inference: the strategic bet is that specialised inference infrastructure with multi-model + multi-cloud orchestration beats hyperscaler general-purpose model-serving on tail-latency, throughput-per-dollar and the production-deployment workflow for AI applications. The foundation-model stack is deliberately model-agnostic — Baseten serves open-source models (Meta Llama, Mistral, DeepSeek), customer fine-tunes and bespoke custom models across a multi-cloud deployment surface, with the NVIDIA Series E anchor aligning silicon supply for the inference workload. Above the foundation-model layer, the Truss open-source framework is the developer onboarding surface; the Baseten Inference Stack is the production runtime; consumption-based per-token and per-second pricing is the monetisation surface. The thesis is that inference moves from one-third of AI compute spend in 2023 to roughly two-thirds by end-2026 (Srivastava framing at HumanX 2026), and that the orchestration surface for that spend is the structural prize regardless of which foundation models win the frontier-capability race.
At A Glance
The Numbers
Annualised revenue
Headcount (FTE)
Funding to date
Leadership Team
Baseten is founder-led with all four co-founders remaining in operating roles through the Series E cycle. Senior recruiting has come from infrastructure-adjacent companies including Gusto (where Srivastava and Haghighat met) and the AI/ML platform cohort. CFO, CRO and CTO roles are not separately publicly named at time of writing; the company has not disclosed precise headcount in primary sources, though LinkedIn-visible data places it in the low-hundreds range as of mid-2026.
IM Framework Scoring
IM’s structured assessment of Baseten’s competitive position. The summary below is the headline; expand “Show the full analyst-grade analysis” near the bottom for the per-dimension reasoning and evidence. Methodology →
Funding History
| Date | Round | Raised | Post-money | Lead investor(s) |
|---|---|---|---|---|
| Jan 2026 | Series E | $300M | $5B | IVP and CapitalG (with NVIDIA $150M anchor) |
| Sep 2025 | Series D | $150M | $2.15B | BOND |
| Mar 2024 | Series C | $75M | ~$825M | IVP |
| 2023 | Series B | $40M | — | Spark Capital |
| 2022 | Series A | $20M | — | Greylock |
Cumulative external capital is approximately $590M+ disclosed through the January 2026 $300M Series E at a $5B valuation, led by IVP and CapitalG with NVIDIA participating as a $150M anchor investor and previous backers BOND, Greylock and Spark following on. The Information reported in late May 2026 that Baseten is in talks to raise approximately $1B at an $11B valuation; that round had not closed at time of writing. The Series E followed the September 2025 $150M Series D at $2.15B post-money. Earlier rounds (Series C, B, A, Seed) from named-press cycles and Baseten’s own blog. We rely on Baseten’s primary blog and PYMNTS / Tech Startups / TechCrunch coverage for round dates and valuations and decline-to-publish any figure that only appears on Tracxn or PitchBook.
Competitive Landscape
| Competitor | Positioning | Distribution edge | Threat profile |
|---|---|---|---|
| Bedrock ((Amazon AWS)) |
Amazon’s hyperscaler-native inference service exposing Anthropic, Meta, Mistral, Cohere and AWS-first-party models through a single managed API, positioned as the default inference surface for customers already inside the AWS commit envelope. | Direct AWS console and AWS sales channel, IAM-and-VPC-native procurement, and consumption pricing that lands inside existing AWS enterprise contracts — the moat is the AWS enterprise commit and the regulated-buyer compliance posture rather than per-token price. | High — the hyperscaler-native inference service with the broadest model catalogue and the deepest enterprise distribution; bundles inference into existing AWS contracts. |
| Fireworks AI | Pure-play inference platform for open-source and fine-tuned models with an aggressive performance-per-dollar pitch on Llama, Mistral, DeepSeek and Qwen; positioned as the developer-led mirror of Baseten on the multi-model serving lane. | Direct self-serve developer onboarding plus a direct enterprise sales motion for production workloads; consumption-based per-token pricing and integrations into the LangChain / LlamaIndex / vector-database developer ecosystem. | High — the closest direct mirror on the multi-model inference platform positioning, with comparable ARR trajectory and a similar open-source-model-first stance. |
| Together AI | Multi-model inference and fine-tuning platform with a stated research arm and a deep open-source model catalogue (Llama, DeepSeek, Mixtral, Qwen, FLUX); positioned as a research-flavoured pure-play competitor on the same inference-orchestration primitive as Baseten. | Direct developer self-serve via Together API plus a direct enterprise channel; consumption pricing on inference and dedicated-endpoint contracts for production customers, with research-publication mindshare driving the developer funnel. | High — comparable open-source model serving + custom-model fine-tuning surface; competing head-to-head on developer-led inference deployment. |
| Replicate | Developer-first inference platform with a long-tail open-source model catalogue accessed through a Docker-and-Cog packaging pattern; positioned as the easy-onramp model-zoo for individual developers and prototype workloads rather than enterprise procurement. | Direct developer self-serve and pay-per-second consumption pricing; community-contributed model catalogue and a Cog-on-GitHub developer funnel are the principal channel and moat. | Medium — developer-first inference platform with a strong long-tail model catalogue; flanking risk on the developer-onboarding surface rather than enterprise procurement. |
| NIM microservices ((NVIDIA)) |
NVIDIA’s first-party inference-microservices distribution wrapping CUDA-optimised model containers for deployment on any NVIDIA-accelerated infrastructure; positioned as the silicon vendor’s own inference layer on the hardware Baseten itself runs on. | Direct NVIDIA enterprise channel, bundled with NVIDIA AI Enterprise licences and pre-installed across DGX Cloud and NVIDIA-partner hyperscaler offerings; silicon-vendor lock-in is the structural moat. | Medium-high — NVIDIA’s first-party inference distribution layered on top of the silicon Baseten itself depends on; a structural alignment + competitive overlap that the Series E investment partially de-risks. |
Potential Risks
Hyperscaler-native inference substitution
The principal structural risk is that AWS Bedrock, Azure AI Foundry and Google Vertex AI absorb the inference-platform lane by bundling inference into broader cloud commitments. Baseten’s multi-cloud + open-source-model architecture is a defensible counter-position, but the procurement gravity of hyperscaler enterprise contracts is real and the substitution dynamic is the most-watched competitive variable through 2026.
Foundation-model supplier dependence
Baseten is a serving layer for upstream foundation-model providers (Meta Llama, Mistral, DeepSeek, Anthropic-served-via-API and customer fine-tunes). Capability shifts at the model-provider tier — including model-provider direct-serving moves — propagate directly into Baseten’s value proposition. The Series E NVIDIA participation aligns silicon supply but does not insulate against the model-provider direct-serving substitution.
Pure-play inference competitive cadence
Fireworks AI and Together AI are structurally symmetric pure-play competitors with comparable ARR trajectories and similar open-source-model-first stances. The symmetric-competitor cadence on benchmarks and pricing compresses gross margins and slows the path from $600M annualised ARR (March 2026 disclosure) toward the $11B valuation framing implied by the reported May 2026 funding talks.
Valuation-to-ARR multiple at the reported $11B mark
The reported $11B valuation in the May 2026 funding talks (per The Information headline and Tech Startups summary) against the ~$600M March 2026 annualised ARR implies a high multiple for a pure-inference platform; the bull case is that the ARR ramp continues at the disclosed pace and the multiple resolves through growth; the bear case is that the multi-model future thesis compresses as hyperscaler bundles absorb the lane.
Headcount and execution scale-up
Baseten is in the low-hundreds-employee range with founder-led senior leadership and no separately disclosed CFO/CRO/CTO at time of writing. Scaling against hyperscaler-incumbent procurement and against well-funded pure-play rivals at $600M+ annualised ARR is a known load-bearing risk; the executive-bench appointments through 2026 are a material watch-item.
Recent IM Coverage
- AI Infrastructure — sector landing May 2026.
- AI Tracker — methodology and universe May 2026.
Show recent press coverage of Baseten
- Jan 2026 — Baseten raises $300M Series E at $5B valuation, with $150M anchor from NVIDIA (TechCrunch)
- Sep 2025 — Baseten Secures $150M Series D as the Premier Inference Platform for AI’s App Layer (BusinessWire)
- May 2026 — Inference Firm Baseten Eyes Funding Round at $11 Billion Valuation (PYMNTS)
- May 2026 — AI inference startup Baseten in talks to raise $1 billion at $11 billion valuation (Tech Startups)
- Jan 2026 — Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future (HPCwire)
- Sep 2025 — Baseten Series D: building the inference platform for AI’s app layer (Baseten Blog)
Show the source register for the figures on this page
IM operates a primary-source-where-possible discipline. The figures above come from:
- Revenue: Baseten’s annualised revenue ramped from approximately $200M in December 2025 to approximately $600M in March 2026 per CEO Tuhin Srivastava’s No Priors / Latent Space podcast cycle and corroborating Tech Startups coverage. The Series D announcement disclosed the $200M annualised figure for the Q4 2025 mark.
- Customer accounts: Baseten discloses serving hundreds of production customers including Descript, Patreon, Writer and a multi-category developer-and-enterprise base across the Series D Series D blog post and Series E announcements. Precise paid-customer count is not disclosed in primary sources; we decline-to-publish a specific figure pending company disclosure.
- Headcount: Baseten does not publicly disclose precise headcount. LinkedIn-visible company-page data places the company in the low-hundreds range as of mid-2026; we decline-to-publish a precise figure and reference the careers page as the canonical entry point.
- Funding to date: Cumulative external capital approximately $590M+ through the January 2026 $300M Series E at $5B valuation, led by IVP and CapitalG with NVIDIA as $150M anchor and BOND, Greylock and Spark Capital following on. The May 2026 $1B-at-$11B round reported by The Information had not closed at time of writing.
Methodology & Disclaimer
For metric definitions, source-tier hierarchy, and decline-to-publish rules, see the tracker methodology. Confidence dots (• green / • amber / • red) follow the same convention as the AI Tracker.
Spotted a figure you believe is wrong? Send corrections to info@informationmatters.net.
Information Matters Framework scores are the considered opinion of the IM team — human and AI — applied to publicly-available evidence under a disclosed methodology. They are not statements of fact about the companies scored and they are not investment advice.
