Sarvam Makes 30B and 105B Parameter AI Models Public and Open Source

Sarvam AI has open-sourced two large language models, Sarvam 30B and Sarvam 105B, and made them live for developers to use right now. Both are released under the Apache 2.0 license, and the weights are available for download (including via Hugging Face and AI Kosh) alongside access through Sarvam’s API.

The bigger point is not just “two new models dropped”. Sarvam says these models were built from scratch in India, trained end to end (pre-training, supervised fine-tuning, reinforcement learning) using internally prepared datasets, with compute provided through the IndiaAI Mission. That combination is what turns this into a milestone for India’s AI stack.

If you’re a builder, the takeaway is simple: you can try them today, download weights, self host, fine-tune, or just call the hosted endpoints and ship something without waiting for a closed model’s pricing, policy changes, or availability in your region.

What Sarvam just released (and why it matters)

Sarvam has launched:

  • Sarvam 30B: a 30 billion parameter model designed for speed and efficient deployment.
  • Sarvam 105B: a 105 billion parameter model designed for deeper reasoning, coding, and longer context work.

Live” here matters as much as “open-source”. Live typically means you can actually use it through hosted inference (APIs, demos, product surfaces), while also having the option to pull the downloadable weights for local and enterprise deployment. Sarvam is doing both.

Why this matters for Indian AI, specifically:

  • Domestic frontier-ish capability: training large models from scratch is still hard, mostly because compute, data pipelines, and optimization work are non-trivial at this scale.
  • India-first language coverage: Sarvam states support for all 22 official Indian languages, with a strong push toward the ten most widely spoken languages during training.
  • Developer access: open weights + permissive licensing makes it realistic for startups, labs, and enterprises to build on top of these models without negotiating bespoke contracts.

What you can do today: test, download, fine-tune, run evals, and deploy behind your own product.

30B vs 105B: which model is for what

Think of these as two different tools, not just “small vs big”.

Sarvam 30B: the fast, efficient workhorse

Sarvam 30B is positioned for high throughput tasks where latency and cost matter:

  • chat and Q and A
  • summarization
  • customer support assistants
  • internal copilots for ops teams
  • voice-first short turn conversations (where quick responses matter)

Sarvam has also said Sarvam 30B activates only about 2.4B parameters at a time, which implies a sparse style of execution. In practical terms, that can reduce inference compute compared to a dense model of the same headline size, depending on how it is served.

Sarvam 105B: the heavy lifter

Sarvam 105B is positioned for complex, multi-step work:

  • reasoning heavy tasks
  • coding and debugging
  • long form synthesis across many documents
  • agentic workflows (tool use, planning, multi-step execution)
  • long context tasks (Sarvam has stated 128,000 tokens context window for 105B)

This model powers Indus, Sarvam’s flagship AI assistant.

A simple explanation of “parameter count”

Parameters are basically the learned weights inside the neural network. More parameters often (not always) means more capacity. But what you feel as a user is usually:

  • latency (how fast the model responds)
  • throughput (how many tokens per second, how many concurrent users)
  • memory needs (GPU VRAM, KV cache pressure, context window cost)
  • stability at long context (does it stay coherent or drift)

So, yes, 105B is bigger, but the real difference shows up in your infrastructure bill and response time.

A quick note on efficient deployment: some large models use techniques like Mixture-of-Experts so they do not “activate” all parameters for every token. Sarvam explicitly states an MoE Transformer architecture, so efficiency is part of the design, not an afterthought.

Under the hood: architecture choices that drive efficiency

At 30B and 105B scale, architecture decisions show up directly in serving cost. Sarvam has shared a few specifics here, and they line up with what modern large model stacks do to stay deployable.

Mixture-of-Experts (MoE), at a high level

Sarvam says these models are based on a Mixture-of-Experts Transformer.

MoE is the idea of having multiple “expert” subnetworks, but only activating a subset per token. Instead of running the entire network for every token, the model routes tokens to a few experts. You get:

  • potentially better quality for a given inference cost
  • a path to scale parameter count while keeping per-token compute lower than a dense model of the same total size

MoE is not free. Routing, expert balance, and serving complexity are real concerns. But it is one of the main ways to build large models that can still be offered at reasonable cost.

Grouped Query Attention (GQA) in Sarvam 30B

Sarvam 30B uses Grouped Query Attention (GQA). The basic advantage is that it reduces the memory footprint of attention during inference, especially the KV cache. KV cache grows with context length, so anything that reduces that pressure helps with:

  • higher concurrency on the same GPU
  • longer context at usable speed
  • lower serving cost

Multi-head Latent Attention (MLA) in Sarvam 105B

Sarvam 105B incorporates Multi-head Latent Attention (MLA), which Sarvam says improves efficiency when processing longer context windows.

Attention is one of the major cost centers in Transformers, especially as context windows grow. Designs like MLA aim to reduce attention compute or memory overhead (or both), which becomes critical when you talk about 128K context and real usage, not just a spec on a slide.

What all of this translates to in the real world

When these choices work well, you usually see:

  • lower latency at a given quality level
  • better throughput per GPU
  • context scaling that does not explode your costs as quickly
  • the ability to serve “bigger model behavior” without needing absurd hardware for every deployment

Multilingual focus: Indian languages, Hinglish, and voice-first use cases

Sarvam states the models support all 22 official Indian languages and are optimized for voice-first interactions, including Hinglish.

Multilingual AI is hard in ways that benchmarks often hide:

  • multiple scripts and orthography rules
  • code-mixing (Hinglish, Tanglish, etc.) in the same sentence
  • uneven data distribution across languages and domains
  • noise from OCR and ASR pipelines
  • weak or missing evaluation sets for local intents

What “good Indian language support” should look like, in practice:

  • stable translation and transliteration behavior
  • handling mixed language queries without forcing “pure Hindi” or “pure English”
  • understanding local intents like payments, travel, KYC, government service flows
  • producing responses that are short, clear, and ASR friendly for voice bots

This is where open weights can help the ecosystem. Enterprises can evaluate on their own call logs and domains, and fine-tune for the exact tone and vocabulary their users expect.

Benchmarks to look at (and how to interpret them)

Sarvam has published several benchmark claims, especially for Sarvam 105B :

BenchmarkSarvam-105BGLM-4.5-Air (106B)GPT-OSS-120BQwen3-Next-80B-A3B-Thinking
GENERAL
Math50098.697.297.098.2
Live Code Bench v671.759.572.368.7
MMLU90.687.390.090.0
MMLU Pro81.781.480.882.7
Arena Hard v271.068.188.568.2
IF Eval84.883.585.488.9
REASONING
GPQA Diamond78.775.080.177.2
AIME 25 (w/ tools)88.3 (96.7)83.390.087.8
HMMT (Feb 25)85.869.290.073.9
HMMT (Nov 25)85.875.090.080.0
Beyond AIME69.161.551.068.0
AGENTIC
BrowseComp49.521.338.0
SWE Bench Verified (SWE-Agent Harness)45.057.650.634.46
Tau2 (avg.)68.353.265.855.0
  • benchmarks can be sensitive to prompt format, tool access, and decoding settings
  • contamination risks exist across the industry, so you still want task specific evals
  • a high score does not guarantee reliability in customer support, compliance, or domain specific workflows

Where to access the models: Hugging Face, AI Kosh, and APIs

Sarvam says the models are available through:

  • Sarvam AI’s API (live hosted access)
  • Hugging Face (weights and model assets)
  • AI Kosh (India-linked distribution)

What this means for India’s AI stack: IndiaAI Mission and local infrastructure

Sarvam says these models were developed using compute resources provided through the IndiaAI Mission. That matters because training and serving large models hits real constraints:

  • GPU availability and supply chain limitations
  • power and cooling
  • high speed networking for distributed training
  • the lack of shared evaluation infrastructure for Indian language tasks

National initiatives can help by improving:

  • compute access for training and fine-tuning
  • shared datasets and evaluation suites (especially multilingual)
  • deployment blueprints for public sector and regulated industries

Sarvam’s release also has an ecosystem effect. Startups can build on top, academia can reproduce and test, enterprises can run controlled pilots, and public sector teams can evaluate models on local workflows without being blocked by closed model access.

The models were also unveiled around the India AI Impact Summit 2026, which helped put attention on domestic model building rather than only application layering.

How Sarvam stacks up against global models (and how to compare honestly)

BenchmarkSarvam-105BDeepseek R1 0528Gemini-2.5-Flasho4-miniClaude 4 Sonnet
AIME2588.387.572.092.770.5
HMMT Feb 202585.879.464.283.375.6
GPQA Diamond78.781.082.881.475.4
Live Code Bench v671.773.361.980.255.9
MMLU Pro81.785.082.081.983.7
Browse Comp49.53.220.028.314.7
SWE Bench Verified45.057.648.968.166.6
Tau2 Bench68.362.049.765.964.0
HLE11.28.512.114.39.6

Sarvam has made performance claims and shared benchmark results, but the cleanest approach is still to run side by side eval prompts on your own domain, especially for Indian language and voice scenarios

What you can build now: assistants, Samvaad, and Indus-style experiences

Sarvam 30B already powers Samvaad, Sarvam’s conversational platform. Sarvam 105B powers Indus, the company’s assistant.

Those map to app patterns teams can build immediately:

  • chat assistants for support and sales
  • RAG systems for enterprise knowledge bases
  • voice bots for IVR and WhatsApp style interactions
  • coding copilots for internal developer platforms
  • workflow agents that call tools and execute multi-step tasks

Deployment patterns are pretty straightforward:

  • hosted API if you want speed and minimal ops
  • self hosting if you need control, privacy, and predictable costs at scale
  • fine-tuning if you need domain tone, terminology, and behavior

Examples that fit Indian contexts well:

  • banking or telecom support bots that understand code-mixed queries
  • education tutors for exam prep (Sarvam has shown performance on JEE Mains 2026 style questions)
  • government service assistants that can guide users through forms, eligibility, and document lists in local languages

The bigger story: trust, openness, and geopolitics around LLMs

Open releases matter more now because AI is getting pulled into policy and procurement debates. Governments and labs are clashing over access, restrictions, and safety requirements. One high profile example in the broader AI policy conversation is the Pentagon and Anthropic dispute, which has been linked to new US guidance around government AI contracts.

In that environment, openness gives builders and organizations:

  • more resilience through local deployment options
  • reduced dependency on a single vendor’s availability or policy shifts
  • more transparent evaluation, even if training data details are never fully perfect

The practical impact is what changes for teams shipping real systems in India and outside it: more choice, more control, and more ability to audit behavior.

Wrap-up: the practical takeaway from Sarvam’s 30B and 105B open release

What’s new is clear:

  • two model sizes: Sarvam 30B and Sarvam 105B
  • live access via API and product surfaces
  • open-source weights available on Hugging Face and AI Kosh
  • Apache 2.0 licensing for broad commercial and enterprise use
  • models built and trained end to end in India, using IndiaAI Mission compute, with multilingual focus across Indian languages

How to pick:

  • Sarvam 30B for throughput, cost control, and real time assistants
  • Sarvam 105B for deeper reasoning, coding, long context, and agentic workflows

Next steps before adopting:

  • read the model card and eval methodology
  • verify license plus any usage policy notes
  • run your own benchmark prompts, especially Indian language and domain specific evals
  • test long context behavior on your real documents, not just synthetic cases

The broader implication is hard to miss. India now has stronger open LLM options that can actually be put into production, and that changes what local teams can build without waiting for someone else’s roadmap.

Related posts

India unveiled three Sovereign AI models – India AI impact summit

India AI Impact Summit – Every important announcement

Seedance 2.0 is breaking the internet with its AI video generation

This website uses cookies, AI-driven technology, and human editorial oversight to create and refine our content to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Read More