iTILT Interactive Technologies in Language Teaching
Rating
iTILT is a European project on Interactive Technologies in Language Teaching which focuses on the use of interactive whiteboards in the com…
iTILT is a European project on Interactive Technologies in Language Teaching which focuses on the use of interactive whiteboards in the communicative language classroom. View over two hundred examples of classroom practice including video of class activities, lesson plans and files, and commentary from the teachers and learners involved. See interactive teaching with technology for different languages, proficiency levels, and age groups from seven European countries, helping teachers gain confidence with technology in communicative language teaching.
- Subject areas
- Tags
- Type of material
- Terms of use
Wikisaurus - Wiktionary
Rating
A Wiktionary subproject and a wiki namespace aiming at creating a thesaurus—a dictionary of synonyms, antonyms and further semantically rel…
A Wiktionary subproject and a wiki namespace aiming at creating a thesaurus—a dictionary of synonyms, antonyms and further semantically related terms such as hyponyms, hypernyms, meronyms and holonyms. The purpose of such a thesaurus is mainly to help anyone who writes for living or fun—writers, managers, contributors to wikis, bloggers, and writers of love letters—to find words they don't recall or even know when they recall words that are semantically related to the sought word.
- Subject areas
- Tags
- Type of material
- Terms of use
KiezDeutsch-Korpus (KiDKo)
Rating
KiDKo is a multi-modal digital corpus of spontaneous discourse data from informal, oral peer group situations in multi- and monoethnic spee…
KiDKo is a multi-modal digital corpus of spontaneous discourse data from informal, oral peer group situations in multi- and monoethnic speech communities. KiDKo contains audio data from self-recordings, with aligned transcriptions (i.e., at every point in a transcript, one can access the corresponding area in the audio file). KiDKo offers a new empirical resource for research in domains such as: Kiezdeutsch as a multiethnic dialect of German; youth language in urban areas; linguistic developments in contemporary German; informal language use. KiDKo consists of two parts: the main corpus with spontaneous conversations between young people from a multiethnic community (Berlin-Kreuzberg); a complementary corpus with spontaneous conversations between young people from a monoethnic community with comparable socio-economic indicators (Berlin-Hellersdorf). KiDKo has been developed by project B6 (PI: Heike Wiese) of the collaborative research centre Information Structure (SFB 632) at the University of Potsdam since 2008. Website in German and English. Corpus access online via ANNIS (open source platform). © by kiezdeutschkorpus.de
- Subject areas
- Tags
- Type of material
- Terms of use
NEOROM: Red de Observatorios de neología de las lenguas románicas
Rating
Allows searching the data base of neologisms from the NEOROM network from the daily written press, in all the romance languages, collected …
Allows searching the data base of neologisms from the NEOROM network from the daily written press, in all the romance languages, collected since 2005. It is possible to find neologisms in Catalan, Spanish, Galician, Italian, French (France, Belgium and Quebec), Portuguese (Portugal and Brazil) and Rumanian.
- Subject areas
- Type of material
- Terms of use
Knorpora
Rating
Knorpora is a modified version of the Knoppix 3.3 Live CD for students of corpus-based computational linguistics.
Like Knoppix, the Knorpo…
Knorpora is a modified version of the Knoppix 3.3 Live CD for students of corpus-based computational linguistics.
Like Knoppix, the Knorpora CD allows you to run a fully operational Debian/Linux operating system from the CD-ROM drive, without installing anything on the computer.
The Knorpora edition of Knoppix contains programs and data files that should be of interest to computational linguistics students (WordNet, the Natural Language Toolkit, taggers, etc.)
When you launch Knorpora on your computer, you can immediately start learning how to use a UNIX-like operating system while at the same time experimenting with the kind of UNIX-command-line-based NLP tools that make UNIX the ideal operating system for NLP work.
Even if you already work with UNIX, Knorpora may give you the chance to try some interesting software before you install it.
- Subject areas
- Tags
- Type of material
- Terms of use
Corpus del Español Actual
Rating
The Corpus del Español Actual (the Corpus of Contemporary Spanish) contains 540 million words, which have been lemmatized and tagged with d…
The Corpus del Español Actual (the Corpus of Contemporary Spanish) contains 540 million words, which have been lemmatized and tagged with detailed part-of-speech information. The CEA is made up of the following texts:
The Spanish part of the eleven-language parallel corpus Europarl: European Parliament Proceedings Parallel Corpus, v. 6 (1996-2010);
The Spanish portion of the trilingual Wikicorpus, v. 1.0, which was extracted from a snapshot of Wikipedia (2006); and
The Spanish part of the seven-language parallel corpus MultiUN: Multilingual UN Parallel Text 2000-2009, a corpus made up of the resolutions of the United Nations.
The CEA was tagged using an online Spanish dictionary containing 635,000 wordforms, which was automatically generated from a dictionary of 86,000 single-word lemmas (e.g., unir, inmoralidad, allí) and 26,000 multiword lemmas (e.g., muerte cerebral, carga de profundidad, de armas tomar) (Subirats 1989, 1992, 1994a, 1994b; Mogorrón 1994; Garrido 1999; Bobes 2000). Tag disambiguation was carried out with intersecting finite-state automata using lexical and syntactic information (Subirats 1998, Subirats and Ortega 2000, 2001, Ortega in progress).
- Subject areas
- Type of material
- Terms of use
FrameNet
Rating
The FrameNet project is building a lexical database of English that is both human- and machine-readable, based on annotating examples of ho…
The FrameNet project is building a lexical database of English that is both human- and machine-readable, based on annotating examples of how words are used in actual texts. From the student's point of view, it is a dictionary of more than 10,000 word senses, most of them with annotated examples that show the meaning and usage. For the researcher in Natural Language Processing, the more than 170,000 manually annotated sentences provide a unique training dataset for semantic role labeling, used in applications such as information extraction, machine translation, event recognition, sentiment analysis, etc. For students and teachers of linguistics it serves as a valence dictionary, with uniquely detailed evidence for the combinatorial properties of a core set of the English vocabulary. The project has been in operation at the International Computer Science Institute in Berkeley since 1997, supported primarily by the National Science Foundation, and the data is freely available for download; it has been downloaded and used by researchers around the world for a wide variety of purposes (See FrameNet users).
FrameNet is based on a theory of meaning called Frame Semantics, deriving from the work of Charles J. Fillmore and colleagues (Fillmore 1976, 1977, 1982, 1985, Fillmore and Baker 2001, 2010). The basic idea is straightforward: that the meanings of most words can best be understood on the basis of a semantic frame: a description of a type of event, relation, or entity and the participants in it.
- Subject areas
- Tags
- Type of material
- Terms of use
The COSMAS corpora
Rating
COSMAS (Corpus Search, Management and Analysis System) is a large collection of German text corpora developed at the Mannheim IDS (Institut…
COSMAS (Corpus Search, Management and Analysis System) is a large collection of German text corpora developed at the Mannheim IDS (Institut für deutsche Sprache). With a size of almost two billion words, this is the world’s largest, ever-growing collection of German online corpora for linguistic research. The collection covers a wide variety of sources, e.g. classic literary texts, national and regional newspapers, transcribed spoken language, morpho-syntactically annotated texts and several unique corpora.
- Subject areas
- Type of material
- Terms of use
DIRAE
Rating
Dirae is a reverse dictionary based on the Diccionario de la lengua española of the Real Academia Española. The dictionary is reversed beca…
Dirae is a reverse dictionary based on the Diccionario de la lengua española of the Real Academia Española. The dictionary is reversed because, instead of finding the definition of a word, as in an ordinary dictionary, looking for words found in its definition. Choosing the right search terms, Dirae can also serve as associative thesaurus, search engine etymological, synonym finder, finder of speech and other lexical functions.
- Subject areas
- Type of material
- Terms of use
OPUS: the open parallel corpus
Rating
OPUS is a growing collection of translated texts from the web. In the OPUS project we try to convert and align free online data, to add lin…
OPUS is a growing collection of translated texts from the web. In the OPUS project we try to convert and align free online data, to add linguistic annotation, and to provide the community with a publicly available parallel corpus. OPUS is based on open source products and the corpus is also delivered as an open content package. We used several tools to compile the current collection. All pre-processing is done automatically. No manual corrections have been carried out.
- Subject areas
- Type of material
- Terms of use
Russian National Corpus
Rating
This website contains a corpus of the modern Russian language incorporating over 300 million words. The corpus of Russian is a reference sy…
This website contains a corpus of the modern Russian language incorporating over 300 million words. The corpus of Russian is a reference system based on a collection of Russian texts in electronic form.
The Corpus is intended for all who are interested in the Russian language and various associated fields: professional linguists, language teachers, school and university students, foreigners learning the language.
- Subject areas
- Type of material
- Terms of use
EMILLE (Enabling Minority Language Engineering)
Rating
EMILLE (Enabling Minority Language Engineering) was a 3 year EPSRC project at Lancaster University and Sheffield University. Its end produc…
EMILLE (Enabling Minority Language Engineering) was a 3 year EPSRC project at Lancaster University and Sheffield University. Its end product was a 97 million word electronic corpus of South Asian languages, especially those spoken in the UK.
- Subject areas
- Tags
- Type of material
- Terms of use
BoLC - Bononia Legal Corpus
Rating
The Bononia Legal Corpus - BoLC - is the result of an on-going research project. It is aimed at the construction and analysis of a multilin…
The Bononia Legal Corpus - BoLC - is the result of an on-going research project. It is aimed at the construction and analysis of a multilingual comparable legal corpus. It is being developed at the University of Bologna. It has been coordinated by Rema Rossini Favretti and Fabio Tamburini. John Sinclair played a crucial role as consultant. We wish to thank Adriano Di Pietro for his contribution during the corpus design and implementation.
- Subject areas
- Type of material
- Terms of use
Protégé
Rating
A free, open-source ontology editor and framework for building intelligent systems
Protégé is supported by a strong community of academic,…
A free, open-source ontology editor and framework for building intelligent systems
Protégé is supported by a strong community of academic, government, and corporate users, who use Protégé to build knowledge-based solutions in areas as diverse as biomedicine, e-commerce, and organizational modeling.
- Subject areas
- Type of material
- Terms of use
AGTK: Annotation Graph Toolkit
Rating
Annotation Graphs are a formal framework for representing linguistic annotations of time series data. Annotation graphs abstract away from …
Annotation Graphs are a formal framework for representing linguistic annotations of time series data. Annotation graphs abstract away from file formats, coding schemes and user interfaces, providing a logical layer for annotation systems.
- Subject areas
- Tags
- Type of material
- Terms of use
linguatools
Rating
Translation in context. Ontology building. Semantic similarity. Visualization of lexical fields.
- Subject areas
- Tags
- Type of material
- Terms of use
DISCO (extracting DIstributionally related words using CO-occurrences): compute semantic similarity between words
Rating
DISCO is a Java application that allows to retrieve the semantic similarity between arbitrary words and phrases. The similarities are based…
DISCO is a Java application that allows to retrieve the semantic similarity between arbitrary words and phrases. The similarities are based on the statistical analysis of very large text collections. The tool runs on all popular operating systems, including Windows, Linux, Solaris, and MacOS.
- Subject areas
- Type of material
- Terms of use
LDC Online: Tools
Rating
LDC Corpus Search: Search our archive of newstext in Arabic, Chinese, and English as well as English telephone conversations from Fisher an…
LDC Corpus Search: Search our archive of newstext in Arabic, Chinese, and English as well as English telephone conversations from Fisher and Switchboard
American English Spoken Lexicon: Search and listen to audio files for more than 50,000 of the most common words in English.
- Subject areas
- Type of material
- Terms of use
Informatica per la didattica: cinese ed arabo.
Rating
Using computers for teaching Italian language to foreigners: Chinese and Arabic.
- Subject areas
- Tags
- Type of material
- Terms of use
Informatica per la didattica: fondamenti di HTML.
Rating
HTML Basics for the teaching of modern foreign languages.
- Subject areas
- Tags
- Type of material
- Terms of use
Corpora e linguistica in rete
Rating
This book offers a general overview of theoretical. organizational and research topics about the constitution and the usage of the Electron…
This book offers a general overview of theoretical. organizational and research topics about the constitution and the usage of the Electronic POS-tagged Corpus "Corpus Taurinense". The corpus covers text in Old Italian (Italiano antico).
- Subject areas
- Tags
- Type of material
- Terms of use
Molti occhi sono meglio di uno: saggi di linguistica generale 2008-12.
Rating
The present essays are mainly unpublished, and are all written in the last five years. They cover five perspectives on the different horizo…
The present essays are mainly unpublished, and are all written in the last five years. They cover five perspectives on the different horizons of general linguistics: from America to Far East, from historical linguistics to the history of linguistics, from generative grammar to corpus linguistics.
Contents: 0a. Introduzione; 0b. Prefazione di Franco Crevatin; 1. Tassonomia, filogenesi ed altro: la classificazione linguistica del Nordamerica; 2. Per una soluzione teorica e storica dei rapporti tra grammatica generativa e linguistica dei corpora; 3. Anafora e deissi in diacronia: il caso del voto; 4. Una introduzione ai NUNC: storia della creazione di un corpus; 5. Il Prete Gianni ed i kitan neri: una nota.
- Subject areas
- Type of material
- Terms of use
Schema e storia del "Corpus Taurinense". Linguistica dei corpora dell'italiano antico.
Rating
The full documentation of the Corpus Taurinense and the widest available grammatical introduction to Old Italian in corpus linguistics pers…
The full documentation of the Corpus Taurinense and the widest available grammatical introduction to Old Italian in corpus linguistics perspective. More than: 1,000 pages; 4,000 full quotations; 250 Old Italian texts quoted; 500 CQP queries.
- Subject areas
- Tags
- Type of material
- Terms of use
ToBI
Rating
ToBI is a framework for developing community-wide conventions for transcribing the intonation and prosodic structure of spoken utterances i…
ToBI is a framework for developing community-wide conventions for transcribing the intonation and prosodic structure of spoken utterances in a language variety. A ToBI framework system for a language variety is grounded in careful research on the intonation system and the relationship between intonation and the prosodic structures of the language (e.g., tonally marked phrases and any smaller prosodic constituents that are distinctively marked by other phonological means).
- Type of material
- Terms of use
Correlatore
Rating
Correlatore is a program written by Paul Mairano in Tcl / Tk useful for the analysis of rhythmic sound data already labeled. It is designed…
Correlatore is a program written by Paul Mairano in Tcl / Tk useful for the analysis of rhythmic sound data already labeled. It is designed to automatically calculate some related rhythmic (% V, ΔC, ΔV, Varcos, PVIS, CCIs - v. Documentation) from the annotation file produced by Praat. So, for those who want to carry out research on related acoustic rhythm, it is sufficient to label the sound files with Praat and then open the file labeling with Co to obtain the values of related and possibly build graphs.
- Subject areas
- Type of material
- Terms of use
MorFo
Rating
MorFo (Morfemi compositivi e derivativi Fondamentali Online) is a glossary containing more than 300 prefixes and suffixes derivative and It…
MorFo (Morfemi compositivi e derivativi Fondamentali Online) is a glossary containing more than 300 prefixes and suffixes derivative and Italian compositional equipped with definitions and examples.
It helps the teacher wants it to exercise the understanding of texts by the students.
It 'a tool of self: it forces the teacher to focus on minimum units that form the word.
- Subject areas
- Tags
- Type of material
- Terms of use
VINCA. "Varietà di Italiano di Nativi Corpus Appaiato"
Rating
VINCA is a Corpus of Native Written Italian freely available and querable online. Devised by Manuel Barbera and Carla Marello, soon joined …
VINCA is a Corpus of Native Written Italian freely available and querable online. Devised by Manuel Barbera and Carla Marello, soon joined by Elisa Corino, VINCA was born in 2004 as paired corpus for VALICO. The project has now changed direction (C. Marello and E. Corino only) and migrated to another website: http://www.valico.org/. Here only the old homepage of the first version of the project and its original Guidelines are maintained, mainly for historical documentation.
- Subject areas
- Tags
- Type of material
- Terms of use
VALICO. "Varietà di Apprendimento della Lingua Italiana: Corpus Online"
Rating
VALICO is an Italian international Learner Corpus freely available and querable online. Devised by Manuel Barbera and Carla Marello, soon j…
VALICO is an Italian international Learner Corpus freely available and querable online. Devised by Manuel Barbera and Carla Marello, soon joined by Elisa Corino, VALICO was born on the 17th of June 2003. The project has now changed direction (C. Marello and E. Corino only) and migrated to another website: http://www.valico.org/. Here only the old homepage of the first version of the project and its original Guidelines are maintained, mainly for historical documentation.
- Subject areas
- Tags
- Type of material
- Terms of use
NUNC - A Multilanguage Suite of Newsgroups Corpora
Rating
NUNC is a multilingual (It. De. Fr. En. Es. Ma. Su. Ee. Pt.) suite of corpora based on the language of newsgroups, freely available and que…
NUNC is a multilingual (It. De. Fr. En. Es. Ma. Su. Ee. Pt.) suite of corpora based on the language of newsgroups, freely available and querable online. Devised by Manuel Barbera, NUNC was born in 2002, and is currently under developement by A. Allora, M. Barbera, S. Colombo, E. Corino, C. Marello, S. Casavecchia, C. Onesti, M. Tomatis, L. Valle and others. There are already some corpora available for testing (Italian, UK English, French and Spanish).
- Subject areas
- Tags
- Type of material
- Terms of use
Corpus Taurinense (CT)
Rating
The Corpus Taurinense (CT) is a corpus of Old Italian (more specifically XIII century Florentine) of 259,299 tokens (21,087 types and 7,599…
The Corpus Taurinense (CT) is a corpus of Old Italian (more specifically XIII century Florentine) of 259,299 tokens (21,087 types and 7,599 lemmata). It is fully lemmatized, POS-tagged, disambiguated, and marked up for text structure, literary genre and philological forms.
The CT has a long history and is the first corpus we planned. As a matter of fact, it was this project that first aroused Manuel Barbera's interest in Corpus Linguistics and NLP, that cemented his partnership with Carla Marello, and that, eventually, set in motion the train of events which brought in existence bmanuel.org, the computational group associated with it and with Turin University, and corpora.unito.it the pole of linguistic resources distribution.
The CT was conceived by Barbera and Marello on the night of March 14th, 1998 in Padua during a meeting of ItalAnt, and was born in Stuttgart on April 29th, 2000, when the first working demo ("ANT4") was ready for interrogation (midwives were Arne Fitschen, Manuel Barbera and Ulrich Heid).
- Subject areas
- Tags
- Type of material
- Terms of use
Voyant Tools
Rating
Voyant Tools is a web-based text reading and analysis environment. It’s designed to make it easy for you to work with your own text or coll…
Voyant Tools is a web-based text reading and analysis environment. It’s designed to make it easy for you to work with your own text or collection of texts in a variety of formats, including plain text, HTML, XML, PDF, RTF, and MS Word. You can also work with an existing collection of texts like Shakespeare (click the “Open” button on the main page to see other pre-defined collections of texts).
- Subject areas
- Type of material
- Terms of use
Termisti: Microglossaires consultables.
Rating
The center Termisti is defined as an applied linguistics research center whose research topics correspond to different training issues of t…
The center Termisti is defined as an applied linguistics research center whose research topics correspond to different training issues of the future of the future translator or interpreter. His research activities nourish teaching at ISTI and students, Belgian or foreign, regularly conducting practical training. The name of the center, linked to its history, decidedly evokes research terminology. The evolution of the concerns of its members led them to re-specify the search missions in the framework of mutual comprehension problems (translation) in a multilingual society:
Terminography and language planning;
Linguistic engineering applied to questions of lexicography, multilingual terminology and localization;
French foreign language and teaching of languages.
The tools available on the Microglossaires consultables are related to these issues.
- Subject areas
- Tags
- Type of material
- Terms of use
NLTK Corpora
Rating
NLTK has built-in support for dozens of corpora and trained models.
- Subject areas
- Tags
- Type of material
- Terms of use
MultiWordNet
Rating
MultiWordNet is a multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet 1.6.
The Italian s…
MultiWordNet is a multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet 1.6.
The Italian synsets are created in correspondence with the Princeton WordNet synsets, whenever possible, and semantic relations are imported from the corresponding English synsets; i.e., we assume that if there are two synsets in PWN and a relation holding between them, the same relation holds between the corresponding synsets in Italian. While the project stresses the usefulness of a strict alignment between wordnets of different languages, the multilingual hierarchy implemented is able to represent true lexical idiosyncrasies between languages, such as lexical gaps and denotation differences.
The information contained in the database can be browsed through the MultiWordNet browser, which facilitates the comparison of the lexica of the aligned languages. The MultiWordNet browser also allows for the access to the Spanish, Portuguese, Hebrew, Romanian and Latin WordNets, made available by courtesy of the TALP Group at the Universitat Politecnica de Catalunya (Spain), the NLX-Group at the University of Lisbon (Portugal), the Computational Linguistic Group at the University of Haifa (Israel), the "Alexandru Ioan Cuza" University of Iasi (Romania), and University of Verona (Italy) respectively. Although the Spanish, Portuguese, Hebrew, Romanian and Latin WordNets are compatible with the MultiWordNet model, these wordnets are not part of the MultiWordNet distribution.
- Subject areas
- Tags
- Type of material
- Terms of use
Tesoro della Lingua Italiana delle Origini
Rating
Tesoro della Lingua Italiana delle Origini (TLIO), the first section of the chronological historical Italian vocabulary.
A selection of th…
Tesoro della Lingua Italiana delle Origini (TLIO), the first section of the chronological historical Italian vocabulary.
A selection of the same items is printed in the Bulletin of the OVI; the network version can be updated than the printed version.
- Subject areas
- Tags
- Type of material
- Terms of use
Corpus Analysis with Antconc
Rating
Corpus analysis is a form of text analysis which allows you to make comparisons between textual objects at a large scale (so-called ‘distan…
Corpus analysis is a form of text analysis which allows you to make comparisons between textual objects at a large scale (so-called ‘distant reading’). It allows us to see things that we don’t necessarily see when reading as humans. If you’ve got a collection of documents, you may want to find patterns of grammatical use, or frequently recurring phrases in your corpus. You also may want to find statistically likely and/or unlikely phrases for a particular author or kind of text, particular kinds of grammatical structures or a lot of examples of a particular concept across a large number of documents in context. Corpus analysis is especially useful for testing intuitions about texts and/or triangulating results from other digital methods.
- Subject areas
- Tags
- Type of material
- Terms of use
Transliterating non-ASCII characters with Python
Rating
This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardi…
This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardized format using the American Standard Code for Information Interchange (ASCII) characters. It builds on readers’ understanding of Python from the lessons “Viewing HTML Files,” “Working with Web Pages,” “From HTML to List of Words (part 1)” and “Intro to Beautiful Soup.” At the end of the lesson, we will use the transliteration dictionary to convert the names from a database of the Russian organization Memorial from Cyrillic into Latin characters. Although the example uses Cyrillic characters, the technique can be reproduced with other alphabets using Unicode.
- Subject areas
- Tags
- Type of material
- Terms of use
WordNet. A lexical database for English
Rating
WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets)…
WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet's structure makes it a useful tool for computational linguistics and natural language processing.
- Subject areas
- Tags
- Type of material
- Terms of use
UCREL Semantic Analysis System (USAS)
Rating
Top level codes The UCREL semantic analysis system is a framework for undertaking the automatic semantic analysis of text. The framework ha…
Top level codes The UCREL semantic analysis system is a framework for undertaking the automatic semantic analysis of text. The framework has been designed and used across a number of research projects and this page collects together various pointers to those projects and publications produced since 1990.
- Subject areas
- Type of material
- Terms of use
Intro to the Zotero API
Rating
In this lesson, you’ll learn how to use python with the Zotero API to interact with your Zotero library. The Zotero API is a powerful inter…
In this lesson, you’ll learn how to use python with the Zotero API to interact with your Zotero library. The Zotero API is a powerful interface that would allow you to build a complete Zotero client from scratch if you so desired. But like most APIs, it works in small, discrete steps, so we have to build our way up to the complicated requests we might want to use to access our Zotero libraries. But this incremental building gives us plenty of time to learn as we go along.
- Subject areas
- Type of material
- Terms of use
Starting with RefWorks
Rating
This document is an introduction to RefWorks - an online research management, writing and collaboration tool designed to help researchers e…
This document is an introduction to RefWorks - an online research management, writing and collaboration tool designed to help researchers easily gather, manage, store and share all types of information, as well as generate citations and bibliographies - for staff and students. The workbook contains copies of a PowerPoint presentation that is also available on this site. A workbook for an introductory workshop explaining and demonstrating how to set up a small database of references and use it in preparing a document using MS-Word.
Pen to Paper image by mbgrigby shared under a CC BY-NC-ND 2.0 license.
- Subject areas
- Tags
- Type of material
- Terms of use
Curso básico de Mapas Mentales
Rating
Mind maps allow us to address the important issues more effectively, we save time, facilitate the generation of ideas and strategies and im…
Mind maps allow us to address the important issues more effectively, we save time, facilitate the generation of ideas and strategies and improve our ability to plan and manage. Its educational value is obvious to the production of teaching material.
- Subject areas
- Type of material
- Terms of use
AntPConc
Rating
A freeware parallel corpus analysis toolkit for concordancing and text analysis.
- Subject areas
- Type of material
- Terms of use
TEI by Example Tutorial
Rating
TEI By Example provides tutorial modules for eight different areas of electronic text encoding with the Guidelines developed by the Text En…
TEI By Example provides tutorial modules for eight different areas of electronic text encoding with the Guidelines developed by the Text Encoding Initiative. The main component of each of these modules is an introductory tutorial offering a thematic approach to the most significant concepts described in the TEI Guidelines, that should help the interested novice in getting started to encode their text of choice with the most recent version of TEI.
- Subject areas
- Type of material
- Terms of use
P5: Guidelines for Electronic Text Encoding and Interchange
Rating
These Guidelines have been developed and are maintained by the Text Encoding Initiative Consortium (TEI); see iv.2. Historical Background. …
These Guidelines have been developed and are maintained by the Text Encoding Initiative Consortium (TEI); see iv.2. Historical Background. They are addressed to anyone who works with any kind of textual resource in digital form.
They make recommendations about suitable ways of representing those features of textual resources which need to be identified explicitly in order to facilitate processing by computer programs. In particular, they specify a set of markers (or tags) which may be inserted in the electronic representation of the text, in order to mark the text structure and other features of interest. Many, or most, computer programs depend on the presence of such explicit markers for their functionality, since without them a digitized text appears to be nothing but a sequence of undifferentiated bits. The success of the World Wide Web, for example, is partly a consequence of its use of such markup to indicate such features as headings and lists on individual pages, and to indicate links between pages. The process of inserting such explicit markers for implicit textual features is often called ‘markup’, or equivalently within this work ‘encoding’; the term ‘tagging’ is also used informally. We use the term encoding scheme or markup language to denote the complete set of rules associated with the use of markup in a given context; we use the term markup vocabulary for the specific set of markers or named distinctions employed by a given encoding scheme. Thus, this work both describes the TEI encoding scheme, and documents the TEI markup vocabulary.
- Subject areas
- Type of material
- Terms of use
Data Dictionary Generator
Rating
Aimed at the TEI editing community and intended to be run inside the XML Editor, the DDG generates profiles of every element and attribute …
Aimed at the TEI editing community and intended to be run inside the XML Editor, the DDG generates profiles of every element and attribute appearing in a TEI file. Each entry includes a definition from the TEI Guidelines, a local, project-specific definition (if provided), and a brief snapshot of how the element or attribute is actually being used. By making it easy to compare these three things, the DDG aims to help project editors reflect on current practice within their projects and quickly create stronger encoding guidelines for their collaborators.
- Subject areas
- Type of material
- Terms of use
Developing Linguistic Corpora: a Guide to Good Practice
Rating
A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer.…
A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study.
In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose.
This Guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will also find the guidelines here useful. It also has relevance for those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their analysis.
- Subject areas
- Tags
- Type of material
- Terms of use
TreeTagger
Rating
The TreeTagger is a tool for annotating text with part-of-speech and lemma information. It was developed by Helmut Schmid in the TC project…
The TreeTagger is a tool for annotating text with part-of-speech and lemma information. It was developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Portuguese, Galician, Chinese, Swahili, Slovak, Slovenian, Latin, Estonian, Polish and old French texts and is adaptable to other languages if a lexicon and a manually tagged training corpus are available.
- Subject areas
- Type of material
- Terms of use
LIM
Rating
The system Lim is an environment for the creation of educational materials, formed by an editor activities (EdiLim), a display (LIM) and a …
The system Lim is an environment for the creation of educational materials, formed by an editor activities (EdiLim), a display (LIM) and a file in XML format (book) that defines the properties of the book and the pages that compose it.
- Subject areas
- Type of material
- Terms of use
AR SPELL Podcasting in the ELL Classroom
Rating
Podcasting can be a great way to get students, parents, and community members involved with classroom activities and information. ELL stude…
Podcasting can be a great way to get students, parents, and community members involved with classroom activities and information. ELL students can use podcasting as a way to demonstrate the skills they are developing as well as provide a way to reach other ELL students who may be encountering similar (difficulties).
- Subject areas
- Tags
- Type of material
- Terms of use
Corpus Linguistics: Method, Analysis, Interpretation
Rating
This MOOC, offered by the Lancaster University, gives a practical introduction to the methodology of corpus linguistics for researchers in …
This MOOC, offered by the Lancaster University, gives a practical introduction to the methodology of corpus linguistics for researchers in social sciences and humanities. It allows those with an interest in language, who have not heard of the corpus approach before, a new way of looking at language.
- Subject areas
- Type of material
- Terms of use
Pronunciation Exercises for English Language Learners
Rating
Pronunciation Exercises for English Language Learners by Waikato Pathways College University of Waikato & Oxford University Press.
- Subject areas
- Tags
- Type of material
- Terms of use