Corpus for ASR in the 5 More Spoken Languages in the World
According to the "Anuario 2013" of the "Instituto Cervantes"1 and the
"Atlas de la lengua española en el mundo"2, the five more spoken
languages in the world are: mandarin-chinese, english, spanish, hindi and arabic.
So in this section, we show different comparison tables between several corpus in these
five languages, extracted from the Linguistic Data Consortium (LDC) and the European Language Resources Association (ELRA).
1. The Instituto Cervantes (http://www.cervantes.es/) is a public organization founded in Spain on March 21st, 1991 by the government of this country, sponsored by the king of Spain. It depends on the "Ministerio de Asuntos Exteriores" and its main goal is to promote the teaching of the Spanish language and the culture of Spain and Hispanoamerica all over the world.
2. http://cvc.cervantes.es/lengua/anuario/anuario_13/