Corpus Search

CQL Query: query builder | visualize | options

On this page you can search through the entire language using the Corpus Query Language (CQL) of the Corpus Workbench. A basic query in CQL searches for a sequence of words, where each word is represented by square brackets, with inside those brackets restrictions on the word. The restrictions indicate which feature of the word to search for, followed by a regular expression indicating the desired value. So you can search for a word beginning ending in the letter L followed by a word starting with the letter A as follows:

[ form = ".*l" ] [ form = "a.*" ]

To facilitate searching, the interface provides a query builder which provides an easy way to define queries in CQL. Just click on the query builder icon to open the query builder, define your query, and click on the button to insert that query in the CQL query box, after which you can modify it by hand if needed, or simply hit search. In the query builder, you can build more complex CQL queries that restrict the documents to search in, or the utterance to search for by restricting the results to a specific genre, or to the gender of the speaker. You can also search for sentences containing a word in the free translation tier. 

For more information about the searchable fields for each language, and how their information was obtained from the original EAF files, see the conversion page.

If you use data from this search interface, then please cite the following paper (apart from the generic DoReCo citations):

Searchable Spoken Corpora on 50+ Small Languages: DoReCo meets TEITOK