Corpus Taurinense (CT) [OER]

Corpus Taurinense (CT)

The Corpus Taurinense (CT) is a corpus of Old Italian (more specifically XIII century Florentine) of 259,299 tokens (21,087 types and 7,599 lemmata). It is fully lemmatized, POS-tagged, disambiguated, and marked up for text structure, literary genre and philological forms. The CT has a long history and is the first corpus we planned. As a matter of fact, it was this project that first aroused Manuel Barbera's interest in Corpus Linguistics and NLP, that cemented his partnership with Carla Marello, and that, eventually, set in motion the train of events which brought in existence bmanuel.org, the computational group associated with it and with Turin University, and corpora.unito.it the pole of linguistic resources distribution. The CT was conceived by Barbera and Marello on the night of March 14th, 1998 in Padua during a meeting of ItalAnt, and was born in Stuttgart on April 29th, 2000, when the first working demo ("ANT4") was ready for interrogation (midwives were Arne Fitschen, Manuel Barbera and Ulrich Heid).

view resource

Type of material

Terms of use

Target audience

Subject areas

Tags

Languages

Media formats

Other metadata: author: Barbera, Manuel; author: Marello, Carla; author: Tomatis, Marco; publisher: Barbera, Manuel; publisher: Università degli Studi di Torino

OER type: Metadata and online reference

Submitted by Fernando Martínez de Carnero
30/11/2015
in the project Strumenti e tecnologie per insegnare le lingue