LlamaIndex
Open-source agent-and-data framework for building knowledge agents over enterprise data — the LlamaIndex Python library, LlamaParse document-parsing engine and LlamaCloud managed platform, with a waitlist of 10,000+ organisations including 90 Fortune 500 companies and a March 2025 $19M Series A led by Norwest Venture Partners.
The Business
LlamaIndex is an open-source agent-and-data framework company founded in 2023 by Jerry Liu and Simon Suo (both former Uber AI Labs research scientists). The product line spans three main surfaces: the open-source LlamaIndex library on PyPI as the category-defining retrieval-augmented-generation framework for retrieval, indexing and document-Q&A; LlamaParse, the document-parsing engine that handles unstructured-data ingest from PDFs and other complex documents into structured data for LLM consumption; and LlamaCloud, the managed-platform layer that converts the open-source library adoption into a turn-key solution for agentic knowledge management over unstructured data. The company is privately held and has raised approximately $28.5M+ of external capital through the March 4 2025 $19M Series A led by Norwest Venture Partners with continued participation from existing investor Greylock, plus strategic minority equity investments from Databricks Ventures’ AI Fund and KPMG Ventures in May 2025. The Series A cycle marked the LlamaCloud general-availability launch as the company’s principal managed-platform commercial surface.
Customers and Distribution
LlamaIndex does not separately publish ARR, revenue or paying-customer counts in primary sources. The March 2025 Series A announcement references a waitlist of over 10,000 organisations including 90 Fortune 500 companies as the headline distribution metric; the waitlist-vs-paying-customer split is not separately disclosed. Distribution sits across four channels: the open-source LlamaIndex library distributed via PyPI as the primary developer-funnel driver; LlamaCloud and LlamaParse as the managed-platform tier converting open-source adoption into commercial revenue; the Databricks Ventures and KPMG Ventures strategic-investor relationships as enterprise-channel anchors (Databricks for data-platform integration, KPMG for enterprise-consulting distribution); and the direct enterprise sales motion for the LlamaCloud enterprise tier post the March 2025 Series A. The competitive distribution frame is the symmetric open-source-to-managed motion shared with LangChain on the agent-framework lane.
Model Strategy
LlamaIndex’s strategic model is the document-RAG-and-knowledge-agent specialism. The framework is purpose-built for retrieval-augmented generation, document-Q&A and structured-data agents — vertical depth on the document-heavy enterprise-knowledge-agent surface that horizontal agent-framework competitors (LangChain on the broader agent lane, CrewAI on multi-agent orchestration, AutoGen on Microsoft’s open-source surface) do not match per-surface. The platform architecture splits into three layers: LlamaParse as the document-parsing engine that handles complex unstructured-data ingest; the LlamaIndex library for retrieval, indexing and agent primitives; and LlamaCloud as the managed-platform layer that converts the open-source library adoption into managed-platform revenue. The supplier strategy is multi-model foundation-model routing (LlamaIndex routes to Anthropic, OpenAI, Google, Mistral, Meta and open-source models); the data strategy is the document-RAG specialism as the vertical-depth moat. The May 2025 Databricks Ventures and KPMG Ventures strategic minority equity investments are the structural distribution anchors into the data-platform and enterprise-consulting channels respectively.
At A Glance
The Numbers
Annualised revenue
Monthly PyPI downloads
Headcount (FTE)
Funding to date
Leadership Team
LlamaIndex is a founder-led open-source agent-framework company; founders Liu and Suo remain in place from the 2023 founding through the March 2025 Series A. The company is at the Series A scale-up stage; senior C-suite appointments below the founder layer have not been separately publicly disclosed in primary sources at time of writing. The company is at the stage where developer-first platform velocity matters more than enterprise-software-style senior bench depth.
IM Framework Scoring
IM’s structured assessment of LlamaIndex’s competitive position. The summary below is the headline; expand “Show the full analyst-grade analysis” near the bottom for the per-dimension reasoning and evidence. Methodology →
Funding History
| Date | Round | Raised | Post-money | Lead investor(s) |
|---|---|---|---|---|
| May 2025 | Strategic minority equity | Undisclosed | — | Databricks Ventures’ AI Fund (with KPMG Ventures) |
| Mar 2025 | Series A | $19M | — | Norwest Venture Partners (with Greylock) |
| Jun 2023 | Seed | $8.5M | — | Greylock Partners |
| 2022 | Pre-seed | ~$1M | — | Angel cohort |
Cumulative external capital approximately $28.5M+ across the pre-seed, seed and Series A rounds plus the May 2025 strategic minority equity investments from Databricks Ventures’ AI Fund and KPMG Ventures (amounts undisclosed in primary sources). Headline round is the March 2025 $19M Series A led by Norwest Venture Partners with continued participation from existing investor Greylock per the company’s own announcement and named-press coverage. We rely on the company’s own blog and named-press coverage for round figures and decline-to-publish any figure that only appears on Tracxn or PitchBook.
Competitive Landscape
| Competitor | Positioning | Distribution edge | Threat profile |
|---|---|---|---|
| LangChain | The broadest open-source LLM application framework — chains, agents, tools, integrations — plus a paid commercial stack (LangSmith observability, LangGraph orchestration). Positioned as the horizontal default for LLM app developers. | Open-source GitHub funnel (>90k stars) into LangSmith / LangGraph Platform paid conversion; aggressive enterprise GTM following the Oct 2025 $125M Series B at $1.25B post-money. | High — the closest symmetric open-source agent-framework competitor at materially greater capital scale ($125M Series B at $1.25B post-money October 2025); head-to-head on the developer-first agent-framework lane and on the open-source-to-managed-platform conversion economics. |
| CrewAI | Open-source multi-agent orchestration framework focused on role-based agents collaborating on tasks; narrower than LlamaIndex’s RAG-and-knowledge-agent surface but rising fast in the agent-team lane. | Open-source GitHub adoption (>40k stars) feeding the CrewAI Enterprise hosted platform; developer-led PLG plus a direct enterprise tier. | Medium-high — open-source multi-agent framework focused on role-based agent orchestration; flanks LlamaIndex on the multi-agent lane but less direct on the document-RAG-and-knowledge-agent surface LlamaIndex anchors. |
| Anthropic Claude SDK / OpenAI Agents SDK / Google Gemini SDK | First-party agent SDKs from the model labs themselves — OpenAI Agents SDK (March 2025), Anthropic’s Claude tool-use + computer-use SDK, Google’s Gemini SDK / Vertex AI Agent Builder. Positioned as the native, vendor-blessed way to build agents on each model. | Bundled into each lab’s developer console + API account; reaches every existing API customer with zero new procurement and is the default starting point for new agent projects. | High and asymmetric — foundation-model first-party agent SDKs shipped by the model labs LlamaIndex itself routes to. Asymmetric supplier-vs-rival dynamic at the agent-framework surface. |
| Microsoft AutoGen / Semantic Kernel ((Microsoft)) |
Microsoft’s two open-source agent libraries — AutoGen for multi-agent conversations (Microsoft Research origin) and Semantic Kernel for production .NET / Python orchestration. Positioned for Microsoft-stack enterprises building on Azure OpenAI. | GitHub open-source funnel into Azure AI Foundry / Azure OpenAI procurement; flows through Microsoft enterprise channels and field-engineering teams. | Medium — Microsoft’s open-source agent-framework libraries; structural competitor on the open-source agent-framework surface with Microsoft enterprise-channel distribution flanking the developer-first lane. |
| Pinecone / Weaviate / Chroma | Vector-database providers moving up the stack — Pinecone Assistant (managed RAG), Weaviate Generative Search, Chroma’s hosted retrieval — positioning the storage layer as an end-to-end retrieval product rather than a passive backend. | Each ships an SDK and managed cloud (Pinecone serverless, Weaviate Cloud, Chroma Cloud) and is referenced from frontier-lab docs; reaches developers through their existing vector-DB integration points. | Adjacent rather than substitutive — vector-database providers are the storage layer LlamaIndex’s RAG framework routes to; competitive only where vector-DB vendors push upward into the framework lane (Pinecone Assistant, Weaviate Generative Search). |
Potential Risks
Symmetric competition with LangChain at greater capital scale
LangChain raised a $125M Series B at $1.25B post-money in October 2025 — materially larger capital position than LlamaIndex’s $19M Series A. The bull case is that LlamaIndex’s document-RAG-and-knowledge-agent specialism is the structural differentiator that compounds developer adoption faster than horizontal agent-framework competitors; the bear case is that LangChain’s capital position enables broader product scope and enterprise sales motion across the same agent-framework procurement budget.
Foundation-model first-party agent-SDK substitution
Anthropic Claude SDK, OpenAI Agents SDK and Google Gemini SDK ship first-party agent capability on the same models LlamaIndex itself routes to. The bull case is that the multi-model framework architecture is the moat against any single SDK substitution and that the document-RAG-and-knowledge-agent specialism is structurally adjacent rather than substitutive to first-party SDKs; the bear case is that sustained capability investment at Anthropic, OpenAI and Google on first-party agent SDKs compresses LlamaIndex library adoption on the same models.
Open-source-to-managed conversion economics at Series A scale
LlamaIndex’s commercial book at the Series A disclosure point rests on the conversion from open-source library adoption into LlamaCloud and LlamaParse managed-platform revenue. The Series A scale-up cycle is the structural test of the conversion economics; the LlamaCloud general-availability launch at the March 2025 Series A is the watched commercial trajectory through 2026. We decline-to-publish ARR figures sourced only from blocklisted aggregators.
Capital position relative to LangChain
Cumulative external capital approximately $28.5M+ across the pre-seed, seed and Series A is materially smaller than LangChain’s capital base (~$160M+ cumulative through Series B at $1.25B post-money). The bull case is that LlamaIndex’s capital efficiency on the document-RAG specialism plus the May 2025 Databricks Ventures and KPMG Ventures strategic-investor relationships compensates; the bear case is that any sustained competitive escalation from LangChain compresses LlamaIndex’s market position before a Series B priced round can be raised.
Regulatory exposure — EU AI Act deployer-obligation regime
Enterprise agent deployments built on LlamaIndex primitives are subject to EU AI Act deployer-obligation regime from August 2 2026, particularly on document-handling use-cases in regulated industries (financial services, healthcare, public sector). The framework architecture is regulatory-light at the platform layer but the customer-deployment surface carries regulatory cadence. The risk is shared across the agent-framework cohort but is a real and active variable on the European enterprise channel.
Recent IM Coverage
- AI Infrastructure sector landing page Jun 2026.
- AI Tracker methodology Jun 2026.
Show recent press coverage of LlamaIndex
- Mar 2025 — Series A Funding And LlamaParse GA Update — $19M Series A led by Norwest Venture Partners. (LlamaIndex Blog)
- Mar 2025 — LlamaIndex Harnesses the Power of Enterprise Data for AI Agent Workflows. (Norwest)
- Mar 2025 — LlamaIndex launches a cloud service for building unstructured data agents. (TechCrunch)
- Mar 2025 — LlamaIndex: $19 Million (Series A) Raised For Enterprise-Grade Knowledge Agents. (Pulse 2)
- Oct 2025 — LlamaIndex Newsletter 2025-10-14 — product updates and customer references. (LlamaIndex Blog)
Show the source register for the figures on this page
IM operates a primary-source-where-possible discipline. The figures above come from:
- Revenue (basis-disclosure note): LlamaIndex does not separately disclose ARR or revenue in primary sources at time of writing. The March 2025 Series A announcement references a 10,000+ organisation waitlist with 90 Fortune 500 customers but does not separately disclose ARR. We decline-to-publish a precise revenue figure pending a primary disclosure.
- Customer accounts: LlamaIndex reports a waitlist of over 10,000 organisations including 90 Fortune 500 companies per the March 2025 Series A announcement. The waitlist-vs-paying-customer split is not separately disclosed.
- Headcount (basis-disclosure note): LlamaIndex is private and does not separately publish headcount in primary sources at time of writing. The company’s careers page is the canonical entry point. We decline-to-publish a precise headcount figure.
- Funding to date: Cumulative external capital approximately $28.5M+ through the March 2025 $19M Series A led by Norwest Venture Partners with continued participation from existing investor Greylock, plus strategic minority equity investments from Databricks Ventures’ AI Fund and KPMG Ventures in May 2025 (amounts undisclosed in primary sources). Prior rounds: June 2023 $8.5M seed led by Greylock Partners and ~$1M pre-seed in 2022.
Methodology & Disclaimer
For metric definitions, source-tier hierarchy, and decline-to-publish rules, see the tracker methodology. Confidence dots (• green / • amber / • red) follow the same convention as the AI Tracker.
Spotted a figure you believe is wrong? Send corrections to info@informationmatters.net.
Information Matters Framework scores are the considered opinion of the IM team — human and AI — applied to publicly-available evidence under a disclosed methodology. They are not statements of fact about the companies scored and they are not investment advice.
