Workflow of a language documentation. Subhashish Panigrahi, CC-BY-SA-4.0

1. Mapping the status quo of the endangered languages in mostly but not limited to the following areas that affect the growth of languages:

  • state language policy
  • native language education and state literacy
  • media, internet and mobile penetration
  • (digital) tools to access and contribute knowledge
  • electronic accessibility e.g. availability of screen reader, text-to-speech and speech-to-text, electronic accessibility tools in public services like ATMs, bus stations, smartphones
  • open-licensed resources like corpus and audio libraries
  • available linguistic tools for machine learning and Natural Language Processing
  • organizations working for the development of the endangered languages

2. Identifying
in need of 
intervention based on the mapping research. A great inspiration can be the "Language Hotspot" model created by the Living Tongues Institute for Endangered Languages that considers a) highest level of linguistic diversity, b) highest levels of endangerment, and c) least-studied languages to identify the "Language Hotspots".

3. Toolkit development and pilot The toolkit consists of a) Collection of FOSS software (I will try to leverage all the available software or try to create some if something is not available), b) User documentations that can be translated into other languages and used across the world, c) Sample datasets from the test runs to help with using the toolkit, and d) Other Open Educational Resources

4. Train citizen archivists 
in select zones and 5. Pilot toolkit
Localize toolkit Some bilingual native speakers — that are conversant in either English or an official language of their region — will be provided training. They — let's call them "Citizen Archivists" — will use the toolkit and create documentations in their languages, and will help annotate the documentations. The documentation can include either journalistic reports or different linguistic aspects (folklore, folk songs, narration of traditional games/festivals)

6. Building communities
 of citizen archivists by providing constant training to the citizen archivists. Their inputs will be improving the toolkit constantly, and help grow

7. Audio-visual reporting
 by them

8. Building a repository of stories that matter to the many native language-speaker community and to language research. The annotated audio-visual documentations will not just help grow a historical documentation of many people in their own language, but create resources for linguistic research to revive the language. For instance, a recorded audio library is very essential to build text-to-speech and speech-to-text engines. Such tools not just help people with visual disability and illiteracy but everyone. There are hundreds of reasons why many languages are dying. This toolkit aims at solving one problem at a time. Check out some of the frequently asked questions.


comments powered by Disqus