Indian Flag
Government Of India
A-
A
A+

Indic Trans2

AI4Bharat's Indic-Trans-v2 is a multilingual Transformer (~1.1BM) NMT model trained on Samanantar v2 dataset which is the largest publicly available parallel corpora collection for languages of India at the time of writing (23 March 2023). We currently release two models - Indic to English and English to Indic and support all the 22 scheduled languages of India.

About Model

Bhashini - IndicTrans2 is the first open-source transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages — including multiple scripts for low-resouce languages like Kashmiri, Manipuri and Sindhi. It adopts script unification wherever feasible to leverage transfer learning by lexical sharing between languages. Overall, the model supports five scripts Perso-Arabic (Kashmiri, Sindhi, Urdu), Ol Chiki (Santali), Meitei (Manipuri), Latin (English), and Devanagari (used for all the remaining languages).

We open-souce all our training dataset (BPCC), back-translation data (BPCC-BT), final IndicTrans2 models, evaluation benchmarks (IN22, which includes IN22-Gen and IN22-Conv) and training and inference scripts for easier use and adoption within the research community. We hope that this will foster even more research in low-resource Indic languages, leading to further improvements in the quality of low-resource translation through contributions from the research community.

This code repository contains instructions for downloading the artifacts associated with IndicTrans2, as well as the code for training/fine-tuning the multilingual NMT models.

For more details about the use of model, refer to github: https://github.com/AI4Bharat/IndicTrans2/tree/main

Indic Trans2

Metadata Metadata

MIT

AI4Bharat

Machine Translation Model

Other

Open

Sector Agnostic

05/03/25 15:24:29

Admin

214.60 KB

Activity Overview Activity Overview

  • Downloads0
  • Downloads 16
  • Views 317
  • File Size 214.60 KB

Tags Tags

  • Machine Translation
  • Language Modeling
  • Bilingual Translation
  • Multilingual Translation
  • Machine Translation
  • Regional Languages
  • Indian Languages
  • Indic-TransV2
  • NLP
  • Computational Linguistics

License Control License Control

MIT

Version Control Version Control

FolderVersion 1(214.60 KB)
  • admin·1 year(s) ago
  • No File(s) Found!

More Models from Daffodil Softwares Pvt. More Models from Daffodil Softwares Pvt.

Bhashini-AI4Bharat Textual Language Detection v1.0
Detect language from provided text, Currently supports 23 languages (English, Bangla, Manipuri, Bodo, Konkani, Oriya, Nepali, Marathi, Sindhi, Sanskrit, Malayalam, Urdu, Assamese, Telugu, Dogri, Gujarati, Kashmiri, Punjabi, Santali, Maithili, Hindi, Tamil, Kannada)
NLP
Multilingual
AI4Bharat
Text data
Text Language Detection
Transformer
Deep Learning
Text Processing
Bhashini
  • See Upvoters0
  • Downloads72
  • File Size3 MB
  • Views857
Updated 11 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

Indic Trans2
AI4Bharat's Indic-Trans-v2 is a multilingual Transformer (~1.1BM) NMT model trained on Samanantar v2 dataset which is the largest publicly available parallel corpora collection for languages of India at the time of writing (23 March 2023). We currently release two models - Indic to English and English to Indic and support all the 22 scheduled languages of India.
Machine Translation
Language Modeling
Bilingual Translation
Multilingual Translation
Machine Translation
Regional Languages
Computational Linguistics
NLP
Indic-TransV2
Indian Languages
  • See Upvoters0
  • Downloads16
  • File Size214.60 KB
  • Views318
Updated 1 year(s) ago

DIGITAL INDIA BHASHINI DIVISION

Indic-Conformer model for ASR
Indo-Aryan Indic-Conformer is a multilingual speech model for North-Indian languages. This model is based on Conformer large architecture, with 115M parameters.
Speech Processing
Bhashini
Automatic Speech Recognition
Speech Technology
Speech Lab
  • See Upvoters0
  • Downloads13
  • File Size64.91 KB
  • Views430
Updated 1 year(s) ago

DIGITAL INDIA BHASHINI DIVISION

IndicXlit
A Transformer-based multilingual transliteration model
Regional Languages
Indian Languages
NLP
transliteration
Language Modeling
Multilingual Translation
Machine Translation
  • See Upvoters0
  • Downloads6
  • File Size3.94 MB
  • Views253
Updated 1 year(s) ago

DIGITAL INDIA BHASHINI DIVISION

Bhashini - Fastspeech2 Model using (HS)
Text-to-speech models trained using FastPitch and HiFi-GAN vocoder, separately for each language. Supports both 'female' and 'male' voices.
Transformer
Text to Speech
Text Processing
NLP
Multilingual
Language Detection
  • See Upvoters0
  • Downloads10
  • File Size286.72 MB
  • Views377
Updated 1 year(s) ago

DIGITAL INDIA BHASHINI DIVISION

Bhashini - IndicNER
IndicNER is a multilingual Named Entity Recognition model fine-tuned on 11 Indian languages to identify named entities in text
Multilingual
Foreigners
NLP
Transformer
Token Classification
Pytorch
Samanantar
Bert
NER
  • See Upvoters0
  • Downloads9
  • File Size591.28 MB
  • Views435
Updated 1 year(s) ago

DIGITAL INDIA BHASHINI DIVISION