Corpus Search and Visualization

This page gives access to a version DoReCo that allows users to search through all files of a language, and to visualize each text in a number of different ways. This searchable version was automatically generated from the DoReCo EAF files using the TEITOK corpus platform (see the conversion page for details).

To search through the corpus, first select a language from the menu on the left, and then within the page of that language, select search. From the page of each language, you can also get to a list of documents (or texts) for each language, and select a document to visualize it. The default visualization uses an Interlinear Glossed Text view, but other views can be selected from the bottom of the IGT view page.

When using actual data from any number of DoReCo datasets in publications, the full reference for each individual dataset must be provided, including the name(s) of the creator(s) of each dataset (see https://doreco.huma-num.fr/). When results obtained from DoReCo's TEITOK version in publications, such as frequency counts obtained through the search function, please cite - in addition to the reference to the individual DoReCo dataset(s):

Janssen, Maarten & Frank Seifart. 2025. Searchable Language Documentation Corpora: DoReCo meets TEITOK. In: Éric Le Ferrand, Elena Klyachko, Anna Postnikova, Tatiana Shavrina, Oleg Serikov, Ekaterina Voloshina & Ekaterina Vylomova (eds.), Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics, 58–64. Vienna, Austria: Association for Computational Linguistics. https://aclanthology.org/2025.fieldmatters-1.5/.