Sarvam Makes 30B and 105B Parameter AI Models Public and Open Source

Nitin TayalMarch 15, 202608 views

Table of Contents

Sarvam AI has open-sourced two large language models, Sarvam 30B and Sarvam 105B, and made them live for developers to use right now. Both are released under the Apache 2.0 license, and the weights are available for download (including via Hugging Face and AI Kosh) alongside access through Sarvam’s API.

The bigger point is not just “two new models dropped”. Sarvam says these models were built from scratch in India, trained end to end (pre-training, supervised fine-tuning, reinforcement learning) using internally prepared datasets, with compute provided through the IndiaAI Mission. That combination is what turns this into a milestone for India’s AI stack.

If you’re a builder, the takeaway is simple: you can try them today, download weights, self host, fine-tune, or just call the hosted endpoints and ship something without waiting for a closed model’s pricing, policy changes, or availability in your region.

What Sarvam just released (and why it matters)

Sarvam has launched:

Sarvam 30B: a 30 billion parameter model designed for speed and efficient deployment.
Sarvam 105B: a 105 billion parameter model designed for deeper reasoning, coding, and longer context work.

“Live” here matters as much as “open-source”. Live typically means you can actually use it through hosted inference (APIs, demos, product surfaces), while also having the option to pull the downloadable weights for local and enterprise deployment. Sarvam is doing both.

Why this matters for Indian AI, specifically:

Domestic frontier-ish capability: training large models from scratch is still hard, mostly because compute, data pipelines, and optimization work are non-trivial at this scale.
India-first language coverage: Sarvam states support for all 22 official Indian languages, with a strong push toward the ten most widely spoken languages during training.
Developer access: open weights + permissive licensing makes it realistic for startups, labs, and enterprises to build on top of these models without negotiating bespoke contracts.

What you can do today: test, download, fine-tune, run evals, and deploy behind your own product.

30B vs 105B: which model is for what

Think of these as two different tools, not just “small vs big”.

Sarvam 30B: the fast, efficient workhorse

Sarvam 30B is positioned for high throughput tasks where latency and cost matter:

chat and Q and A
summarization
customer support assistants
internal copilots for ops teams
voice-first short turn conversations (where quick responses matter)

Sarvam has also said Sarvam 30B activates only about 2.4B parameters at a time, which implies a sparse style of execution. In practical terms, that can reduce inference compute compared to a dense model of the same headline size, depending on how it is served.

Sarvam 105B: the heavy lifter

Sarvam 105B is positioned for complex, multi-step work:

reasoning heavy tasks
coding and debugging
long form synthesis across many documents
agentic workflows (tool use, planning, multi-step execution)
long context tasks (Sarvam has stated 128,000 tokens context window for 105B)

This model powers Indus, Sarvam’s flagship AI assistant.

A simple explanation of “parameter count”

Parameters are basically the learned weights inside the neural network. More parameters often (not always) means more capacity. But what you feel as a user is usually:

latency (how fast the model responds)
throughput (how many tokens per second, how many concurrent users)
memory needs (GPU VRAM, KV cache pressure, context window cost)
stability at long context (does it stay coherent or drift)

So, yes, 105B is bigger, but the real difference shows up in your infrastructure bill and response time.

A quick note on efficient deployment: some large models use techniques like Mixture-of-Experts so they do not “activate” all parameters for every token. Sarvam explicitly states an MoE Transformer architecture, so efficiency is part of the design, not an afterthought.

Under the hood: architecture choices that drive efficiency

At 30B and 105B scale, architecture decisions show up directly in serving cost. Sarvam has shared a few specifics here, and they line up with what modern large model stacks do to stay deployable.

Mixture-of-Experts (MoE), at a high level

Sarvam says these models are based on a Mixture-of-Experts Transformer.

MoE is the idea of having multiple “expert” subnetworks, but only activating a subset per token. Instead of running the entire network for every token, the model routes tokens to a few experts. You get:

potentially better quality for a given inference cost
a path to scale parameter count while keeping per-token compute lower than a dense model of the same total size

MoE is not free. Routing, expert balance, and serving complexity are real concerns. But it is one of the main ways to build large models that can still be offered at reasonable cost.

Grouped Query Attention (GQA) in Sarvam 30B

Sarvam 30B uses Grouped Query Attention (GQA). The basic advantage is that it reduces the memory footprint of attention during inference, especially the KV cache. KV cache grows with context length, so anything that reduces that pressure helps with:

higher concurrency on the same GPU
longer context at usable speed
lower serving cost

Multi-head Latent Attention (MLA) in Sarvam 105B

Sarvam 105B incorporates Multi-head Latent Attention (MLA), which Sarvam says improves efficiency when processing longer context windows.

Attention is one of the major cost centers in Transformers, especially as context windows grow. Designs like MLA aim to reduce attention compute or memory overhead (or both), which becomes critical when you talk about 128K context and real usage, not just a spec on a slide.

What all of this translates to in the real world

When these choices work well, you usually see:

lower latency at a given quality level
better throughput per GPU
context scaling that does not explode your costs as quickly
the ability to serve “bigger model behavior” without needing absurd hardware for every deployment

Multilingual focus: Indian languages, Hinglish, and voice-first use cases

Sarvam states the models support all 22 official Indian languages and are optimized for voice-first interactions, including Hinglish.

Multilingual AI is hard in ways that benchmarks often hide:

multiple scripts and orthography rules
code-mixing (Hinglish, Tanglish, etc.) in the same sentence
uneven data distribution across languages and domains
noise from OCR and ASR pipelines
weak or missing evaluation sets for local intents

What “good Indian language support” should look like, in practice:

stable translation and transliteration behavior
handling mixed language queries without forcing “pure Hindi” or “pure English”
understanding local intents like payments, travel, KYC, government service flows
producing responses that are short, clear, and ASR friendly for voice bots

This is where open weights can help the ecosystem. Enterprises can evaluate on their own call logs and domains, and fine-tune for the exact tone and vocabulary their users expect.

Benchmarks to look at (and how to interpret them)

Sarvam has published several benchmark claims, especially for Sarvam 105B :

Benchmark	Sarvam-105B	GLM-4.5-Air (106B)	GPT-OSS-120B	Qwen3-Next-80B-A3B-Thinking
GENERAL
Math500	98.6	97.2	97.0	98.2
Live Code Bench v6	71.7	59.5	72.3	68.7
MMLU	90.6	87.3	90.0	90.0
MMLU Pro	81.7	81.4	80.8	82.7
Arena Hard v2	71.0	68.1	88.5	68.2
IF Eval	84.8	83.5	85.4	88.9
REASONING
GPQA Diamond	78.7	75.0	80.1	77.2
AIME 25 (w/ tools)	88.3 (96.7)	83.3	90.0	87.8
HMMT (Feb 25)	85.8	69.2	90.0	73.9
HMMT (Nov 25)	85.8	75.0	90.0	80.0
Beyond AIME	69.1	61.5	51.0	68.0
AGENTIC
BrowseComp	49.5	21.3	–	38.0
SWE Bench Verified (SWE-Agent Harness)	45.0	57.6	50.6	34.46
Tau2 (avg.)	68.3	53.2	65.8	55.0

benchmarks can be sensitive to prompt format, tool access, and decoding settings
contamination risks exist across the industry, so you still want task specific evals
a high score does not guarantee reliability in customer support, compliance, or domain specific workflows

Where to access the models: Hugging Face, AI Kosh, and APIs

Sarvam says the models are available through:

Sarvam AI’s API (live hosted access)
Hugging Face (weights and model assets)
AI Kosh (India-linked distribution)

What this means for India’s AI stack: IndiaAI Mission and local infrastructure

Sarvam says these models were developed using compute resources provided through the IndiaAI Mission. That matters because training and serving large models hits real constraints:

GPU availability and supply chain limitations
power and cooling
high speed networking for distributed training
the lack of shared evaluation infrastructure for Indian language tasks

National initiatives can help by improving:

compute access for training and fine-tuning
shared datasets and evaluation suites (especially multilingual)
deployment blueprints for public sector and regulated industries

Sarvam’s release also has an ecosystem effect. Startups can build on top, academia can reproduce and test, enterprises can run controlled pilots, and public sector teams can evaluate models on local workflows without being blocked by closed model access.

The models were also unveiled around the India AI Impact Summit 2026, which helped put attention on domestic model building rather than only application layering.

How Sarvam stacks up against global models (and how to compare honestly)

Benchmark	Sarvam-105B	Deepseek R1 0528	Gemini-2.5-Flash	o4-mini	Claude 4 Sonnet
AIME25	88.3	87.5	72.0	92.7	70.5
HMMT Feb 2025	85.8	79.4	64.2	83.3	75.6
GPQA Diamond	78.7	81.0	82.8	81.4	75.4
Live Code Bench v6	71.7	73.3	61.9	80.2	55.9
MMLU Pro	81.7	85.0	82.0	81.9	83.7
Browse Comp	49.5	3.2	20.0	28.3	14.7
SWE Bench Verified	45.0	57.6	48.9	68.1	66.6
Tau2 Bench	68.3	62.0	49.7	65.9	64.0
HLE	11.2	8.5	12.1	14.3	9.6

Sarvam has made performance claims and shared benchmark results, but the cleanest approach is still to run side by side eval prompts on your own domain, especially for Indian language and voice scenarios

What you can build now: assistants, Samvaad, and Indus-style experiences

Sarvam 30B already powers Samvaad, Sarvam’s conversational platform. Sarvam 105B powers Indus, the company’s assistant.

Those map to app patterns teams can build immediately:

chat assistants for support and sales
RAG systems for enterprise knowledge bases
voice bots for IVR and WhatsApp style interactions
coding copilots for internal developer platforms
workflow agents that call tools and execute multi-step tasks

Deployment patterns are pretty straightforward:

hosted API if you want speed and minimal ops
self hosting if you need control, privacy, and predictable costs at scale
fine-tuning if you need domain tone, terminology, and behavior

Examples that fit Indian contexts well:

banking or telecom support bots that understand code-mixed queries
education tutors for exam prep (Sarvam has shown performance on JEE Mains 2026 style questions)
government service assistants that can guide users through forms, eligibility, and document lists in local languages

The bigger story: trust, openness, and geopolitics around LLMs

Open releases matter more now because AI is getting pulled into policy and procurement debates. Governments and labs are clashing over access, restrictions, and safety requirements. One high profile example in the broader AI policy conversation is the Pentagon and Anthropic dispute, which has been linked to new US guidance around government AI contracts.

In that environment, openness gives builders and organizations:

more resilience through local deployment options
reduced dependency on a single vendor’s availability or policy shifts
more transparent evaluation, even if training data details are never fully perfect

The practical impact is what changes for teams shipping real systems in India and outside it: more choice, more control, and more ability to audit behavior.

Wrap-up: the practical takeaway from Sarvam’s 30B and 105B open release

What’s new is clear:

two model sizes: Sarvam 30B and Sarvam 105B
live access via API and product surfaces
open-source weights available on Hugging Face and AI Kosh
Apache 2.0 licensing for broad commercial and enterprise use
models built and trained end to end in India, using IndiaAI Mission compute, with multilingual focus across Indian languages

How to pick:

Sarvam 30B for throughput, cost control, and real time assistants
Sarvam 105B for deeper reasoning, coding, long context, and agentic workflows

Next steps before adopting:

read the model card and eval methodology
verify license plus any usage policy notes
run your own benchmark prompts, especially Indian language and domain specific evals
test long context behavior on your real documents, not just synthetic cases

The broader implication is hard to miss. India now has stronger open LLM options that can actually be put into production, and that changes what local teams can build without waiting for someone else’s roadmap.