Corpus Taurinense (CT)

view resource

The Corpus Taurinense (CT) is a corpus of Old Italian (more specifically XIII century Florentine) of 259,299 tokens (21,087 types and 7,599 lemmata). It is fully lemmatized, POS-tagged, disambiguated, and marked up for text structure, literary genre and philological forms. The CT has a long history and is the first corpus we planned. As a matter of fact, it was this project that first aroused Manuel Barbera's interest in Corpus Linguistics and NLP, that cemented his partnership with Carla Marello, and that, eventually, set in motion the train of events which brought in existence bmanuel.org, the computational group associated with it and with Turin University, and corpora.unito.it the pole of linguistic resources distribution. The CT was conceived by Barbera and Marello on the night of March 14th, 1998 in Padua during a meeting of ItalAnt, and was born in Stuttgart on April 29th, 2000, when the first working demo ("ANT4") was ready for interrogation (midwives were Arne Fitschen, Manuel Barbera and Ulrich Heid).

Type of material
Terms of use
Target audience
Subject areas
Tags
Languages
Media formats
Other metadata
author: Barbera, Manuel
author: Marello, Carla
author: Tomatis, Marco
publisher: Barbera, Manuel
publisher: Università degli Studi di Torino
OER type
Metadata and online reference

Submitted by Fernando Martínez de Carnero
30/11/2015
in the project Strumenti e tecnologie per insegnare le lingue

last updated 04/12/2015

Original editing language: unknown
Evaluations
No evaluation

Please log in to add evaluation.

Comments

No comments yet.

Please log in to leave a comment.