TEITOK visualization and search interface for Jahai


Language nameJahaijeha1242
Language familyAustroasiaticaust1307
Corpus creatorBurenhult, Niclas
Translations providedEnglish
Glossessome
Annotation file licenceCC BY-NC-SA

This is an interface for visualizing and searching the Jahai DoReCo dataset. For more information about this dataset, including metadata, consult the DoReCo dataset page, where you can also download the data. Use the links in the left-side menu to search through this dataset, or to access individual documents for visualization.

When using actual data from the Jahai DoReCo dataset in publications please cite

Burenhult, Niclas. 2024. Jahai DoReCo dataset. In Seifart, Frank, Ludger Paschen and Matthew Stave (eds.). Language Documentation Reference Corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). https://doreco.huma-num.fr/languages/jeha1242 (Accessed on 23/01/2026). DOI:10.34847/nkl.6eaf5laq

When using results obtained from DoReCo's TEITOK version in publications, such as frequency counts obtained through the TEITOK search function, please cite — in addition to the reference to the Bora DoReCo dataset:

Janssen, Maarten & Frank Seifart. 2025. Searchable Language Documentation Corpora: DoReCo meets TEITOK. In: Éric Le Ferrand, Elena Klyachko, Anna Postnikova, Tatiana Shavrina, Oleg Serikov, Ekaterina Voloshina & Ekaterina Vylomova (eds.), Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics, 58–64. Vienna, Austria: Association for Computational Linguistics. https://aclanthology.org/2025.fieldmatters-1.5/.

Gloss Abbreviations

Below is the list of language-specific glosses used in the Jahai corpus:

GlossLGRMeaning
11first person
22second person
33third person
ADVADVadverbial
AFF(none)affirmative
CAUSCAUScausative
CLFCLFclassifier
COLL(none)collective
CONT(none)continuative
CONTR(none)contrastive
DDUdual
DEMDEMdemonstrative
DIS(none)disjunct
DISTDISTdistal
DISTRDISTRdistributive
EMP(none)emphatic
EQU(none)equative
EXCLEXCLexcusive
EXP(none)(unclear)
FAM(none)familiar
GOAL(none)goal
HORT(none)hortative
ID(none)identification
IMM.FUT(none)immediate future
IMPFIPFVimperfective
INCLINCLinclusive
INSTRINSinstrumental
IRRIRRirrealis
ITER(none)iterative
LOCLOClocative
NEGNEGnegative
NOMNOMnominative
PPLplural
PASTPSTpast
PERFPRFperfective
PROGPROGprogressive
PROHPROHprohibitive
PROP(none)property
QQquestion marker
REDR(none)(unclear)
RELRELrelative
RP(none)root possibility
RT(none)relational tense
SSGsingular
SOURCE(none)source
SUBJSBJsubject
UNIT(none)unitiser