

Corpus and Tools for the Languages of France
Corpus and Tools for the Languages of France

Through the COLaF project (Corpus et Outils pour les Langues de France, Corpus and Tools for the Languages of France), Inria aims to contribute to the development of free corpora and tools for French and other languages of France, in close collaboration with academic and institutional partners.

The scope of COLaF includes both: text data and speech and sign language data.

COLaF aims to cover French and the languages of France in all its diversity:

  • It aims to have a coverage as diverse as possible: French from France and elsewhere, regional languages, French-based creoles (including outside France), indigenous languages, migrant languages, French sign language.
  • All aspects of variation will be studied, beyond the standard state of the language, including specialised languages, diachrony, non-standard states (user-generated content, learner language, etc.).

Inria teams involved


In partnership with

DFKI Berlin, DFKI Saarbrücken, UMR ISIR ( Institut des
systèmes intelligents et de robotique), UMR PRAXLING (Appréhender les languages en tant que pratique sociale).


Slim Ouni

Scientific leader

Benoit Sagot

Scientific co-leader