Powered by <TEI:TOK>
Maarten Janssen, 2014-
CoOrAJe - Ladino Oral Corpus
Welcome to CoOrAJe - the Annotated Oral Corpus of Judeo-Spanish.
- CoOrAJe is a multi-modal corpus in the initial phase of its development that includes oral text samples in Judeo-Spanish, enriched automatically or semi-automatically with different types of linguistic annotations.
- CoOrAJe is being developed on the web-based framework TEITOK. Each audio is enriched with orthographic attributes (textual annotations: transcription, a normalized form in current spelling, and the equivalent form in modern Spanish spelling) and levels of linguistic analysis (linguistic annotations: POS tag, Judeo-Spanish lemma, and the equivalent Spanish lemma).
- Automatic tokenization is carried out in the Text Encoding Initiative (TEI) format, with a slightly modified tokenization system, and tagging in doing semi-automatically with NeoTag. Each token is also provided with a linguistic verified POS tag using a tagset that I have created specifically for Judeo-Spanish, starting from the EAGLES tagset for Spanish, and a lemmatized form for Judeo-Spanish and another for modern Spanish.
- Information concerning each audio and other structural, technical, and administrative data is also provided in a set of metadata.
- The search interface enables searching for words, lemmas, POS tags, combinations of those, and also allows searching in the set of documents or in documents from a specific date, location, or other metadata criteria. CoOrAJErecognizes all kinds of linguistic and orthographic variants of a word in the query and enables descriptive statistics.
How to cite this corpus
CoOrAJe - the Annotated Oral Corpus of Judeo-Spanish. Director: Aldina Quintana. Available online at http://corptedig-glif.upf.edu/teitok/cooraje/ Accessed on [date].