TEITOK visualization and search interface for Urum


Language nameUrumurum1249
Language familyTurkicturk1311
Corpus creatorSkopeteas, Stavros and Moisidi, Violeta and Tsetereli, Nutsa and Lorenz, Johanna and Schröter, Stefanie
Translations providedEnglish
Glossesall
Annotation file licenceCC BY-NC

This is an interface for visualizing and searching the Urum DoReCo dataset. For more information about this dataset, including metadata, consult the DoReCo dataset page, where you can also download the data. Use the links in the left-side menu to search through this dataset, or to access individual documents for visualization.

When using actual data from the Urum DoReCo dataset in publications please cite

Skopeteas, Stavros and Moisidi, Violeta and Tsetereli, Nutsa and Lorenz, Johanna and Schröter, Stefanie. 2024. Urum DoReCo dataset. In Seifart, Frank, Ludger Paschen and Matthew Stave (eds.). Language Documentation Reference Corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). https://doreco.huma-num.fr/languages/urum1249 (Accessed on 23/01/2026). DOI:10.34847/nkl.6eaf5laq

When using results obtained from DoReCo's TEITOK version in publications, such as frequency counts obtained through the TEITOK search function, please cite — in addition to the reference to the Bora DoReCo dataset:

Janssen, Maarten & Frank Seifart. 2025. Searchable Language Documentation Corpora: DoReCo meets TEITOK. In: Éric Le Ferrand, Elena Klyachko, Anna Postnikova, Tatiana Shavrina, Oleg Serikov, Ekaterina Voloshina & Ekaterina Vylomova (eds.), Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics, 58–64. Vienna, Austria: Association for Computational Linguistics. https://aclanthology.org/2025.fieldmatters-1.5/.

Gloss Abbreviations

Below is the list of language-specific glosses used in the Urum corpus:

GlossLGRMeaning
0none(unclear)
11first person
22second person
33third person
ABILnoneability/possibility
ABLABLablative
ACCACCaccusative
ADJRnoneparticiples with the adjectivalizer
AORnoneturkish aorist a tense with non-definite temporal reference
COMPCOMPcomplementizer
CONDCONDconditional
CONVCVBconverb
COPCOPcopula
DATDATdative
EPSTnoneepistemic
EVnoneevidential
EXISTnoneexistential
FFfeminine
FUTFUTfuture
GENGENgenitive
GERnonegerund
HESITnonehesitation
IMPIMPimperative
IMPVnone(unclear)
INFINFinfinitive
INSTINSinstrumental
INSTRINSinstrumental
IPFVIPFVimperfective
IPMnone(unclear)
LOCLOClocative
MMmasculine
NEGNEGnegative
NOMNOMnominative
NRNMLZnominalizer
OPTnoneoptative
PASSPASSpassive
PFVPFVperfective
PLPLplural
POSSPOSSpossesive
POTnonepotential
PROCnone(unclear)
PSTPSTpast
PTCPPTCPparticiple
SGSGsingular
THMnone(unclear)