CoOrAJe - Ladino Oral Corpus

Welcome to CoOrAJe - the Annotated Oral Corpus of Judeo-Spanish.

  • CoOrAJe is a multi-modal corpus in the initial phase of its development that includes oral text samples in Judeo-Spanish, enriched automatically or semi-automatically with different types of linguistic annotations.
  • CoOrAJe is being developed on the web-based framework TEITOK. Each audio is enriched with orthographic attributes (textual annotations: transcription, a normalized form in current spelling, and the equivalent form in modern Spanish spelling) and levels of linguistic analysis (linguistic annotations: POS tag, Judeo-Spanish lemma, and the equivalent Spanish lemma).
  • Automatic tokenization is carried out in the Text Encoding Initiative (TEI) format, with a slightly modified tokenization system, and tagging in doing semi-automatically with NeoTag. Each token is also provided with a linguistic verified POS tag using a tagset that I have created specifically for Judeo-Spanish, starting from the EAGLES tagset for Spanish, and a lemmatized form for Judeo-Spanish and another for modern Spanish.
  • Information concerning each audio and other structural, technical, and administrative data is also provided in a set of metadata.
  • The search interface enables searching for words, lemmas, POS tags, combinations of those, and also allows searching in the set of documents or in documents from a specific date, location, or other metadata criteria. CoOrAJErecognizes all kinds of linguistic and orthographic variants of a word in the query and enables descriptive statistics.


How to cite this corpus

CoOrAJe - the Annotated Oral Corpus of Judeo-Spanish. Director: Aldina Quintana. Available online at Accessed on [date].