As global AI giants race ahead on scale, Indian start-up Sarvam AI is betting on localisation, building a sovereign LLM optimised for Indic languages, voice interfaces, and regional scripts, with government backing and benchmark results that challenge larger models in key multilingual tasks.
In April 2025, the Government of India, under the IndiaAI Mission, selected Sarvam to build India’s first sovereign Large Language Model (LLM).
As part of this, Sarvam would receive dedicated compute resources to build an indigenous foundational model from scratch. Capable of reasoning, designed for voice, and fluent in Indian languages, the model would be ready for population-scale deployment.
Dr. Pratyush Kumar, Co-founder of Sarvam, stated, “Building an AI ecosystem for India has always been core to Sarvam’s mission. As part of the Sovereign LLM proposal, we are developing three model variants: Sarvam-Large for advanced reasoning and generation, Sarvam-Small for real-time interactive applications, and Sarvam-Edge for compact on-device tasks.”
Language focus
Earlier in October 2024, the company had introduced Sarvam-1, a 2-billion-parameter language model optimised for Indian languages.
According to the company, many multilingual models require 4–8 tokens per Indic word (versus 1.4 in English), whereas Sarvam-1 reduces this to 1.4–2.1 tokens across supported languages.
It also achieved high accuracy on both knowledge and reasoning tasks, especially in Indic languages, and outperformed Gemma-2-2B and Llama-3.2-3B on various standard benchmarks.
Ahead of the India AI Impact Summit 2026, Sarvam has introduced a series of innovations. Sarvam-Translate supports 22 Indian languages, including Bengali, Marathi, Telugu, Maithili, Santali, Kashmiri, Nepali, Sindhi, Dogri, and Sanskrit.
The model supports paragraph-level translation for the languages and translates diverse structured content for 15 languages. In human evaluation by language experts, Sarvam-Translate is identified to be significantly better than larger models like Gemma3-27B-IT, Llama4 Scout, and Llama-3.1-405B-FP8.
Sarvam launched Bulbul v1, a code-mixed multilingual text-to-speech model, followed by Bulbul V3 this year, designed to deliver more natural and production-ready voices for Indian languages.
Audio models
According to the company’s website, Sarvam’s Text-to-Speech API, powered by Bulbul v3, supports 11 Indian languages — Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Odia, and English. Each language supports multiple speaker voices with different characteristics.
Saaras v3 is Sarvam’s latest speech-to-text model that auto-detects the spoken language and provides transcription across all 22 supported Indian languages, including Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Odia, and English. It handles code-mixed audio and is optimised for both real-time and batch processing.
Last week, the company launched Sarvam Vision, expanding its sovereign model series beyond voice and text into vision. The new offering is a 3B-parameter state-space vision-language model designed for image captioning, scene text recognition, chart understanding, and complex table parsing.
While leading global vision-language models perform strongly on English documents, they often underperform on Indian languages and regional scripts, the startup said in its blog, adding that Sarvam’s 3B inference-efficient model aims to close that gap.
On the Sarvam Indic OCR Bench — comprising 20,267 document samples across 22 official Indian languages spanning historical and modern texts — the model outperformed Gemini 3 Pro, Opus 4.5, and GPT 5.2 on both word and character accuracy, measured using word error rate–based metrics.
Jaspreet Bindra, Co-Founder & CEO, AI&Beyond, noted that while at a foundational level, modern LLMs like Sarvam, ChatGPT, and Claude are usually built on transformer-based architectures, the real difference lies less in architecture and more in scale, optimisation priorities, and training data.
“ Sarvam appears to be building with an India-first lens, prioritising multilingual capabilities, Indic datasets, and potentially voice-led applications. Its training strategy is likely more curated toward Indian languages, governance documents, and domain-specific corpora relevant to local enterprises. ,” he said.
AI industry analyst Kashyap Kompella, echoed this, adding that Sarvam’s strategy can be defined as “in India, for India”.
“Sarvam seems to be concentrating on enabling and unlocking a different set of Indic use cases that are not the focus of the Western frontier labs. It reports strong relative gains over its base model on Indic benchmarks, math, and programming tasks, with particularly large improvements on romanised Indic GSM-8K,” he said.
On global reasoning benchmarks such as MMLU or complex chain-of-thought evaluations, frontier models like GPT-4-class systems and Claude 3 currently set the performance standard due to their scale and advanced post-training techniques.
However, benchmark performance does not always translate directly into real-world effectiveness. In India-centric applications like multilingual customer support, public service workflows, or voice interfaces, contextual accuracy and linguistic fluency can matter more.
Sarvam’s strength may lie in task-specific optimisation and language alignment, especially in Indic languages and code-switched contexts like Hinglish.
The startup appears more focused on India-first segments — including government, BFSI, telecom, digital public infrastructure, and enterprises requiring strong multilingual support. Its go-to-market strategy may rely more on partnerships with system integrators and enterprise solution providers rather than purely self-serve APIs.
This partnership-led approach aligns well with India’s enterprise landscape, where large digital transformation projects are often delivered through ecosystem collaborations.
Published on February 11, 2026
