Transliterating non-ASCII characters with Python
This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardized format using the American Standard Code for Information Interchange (ASCII) characters. It builds on readers’ understanding of Python from the lessons “Viewing HTML Files,” “Working with Web Pages,” “From HTML to List of Words (part 1)” and “Intro to Beautiful Soup.” At the end of the lesson, we will use the transliteration dictionary to convert the names from a database of the Russian organization Memorial from Cyrillic into Latin characters. Although the example uses Cyrillic characters, the technique can be reproduced with other alphabets using Unicode.
This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardized format using the American Standard Code for Information Interchange (ASCII) characters. It builds on readers’ understanding of Python from the lessons “Viewing HTML Files,” “Working with Web Pages,” “From HTML to List of Words (part 1)” and “Intro to Beautiful Soup.” At the end of the lesson, we will use the transliteration dictionary to convert the names from a database of the Russian organization Memorial from Cyrillic into Latin characters. Although the example uses Cyrillic characters, the technique can be reproduced with other alphabets using Unicode.
- Type of material
- Terms of use
- Target audience
- Subject areas
- Tags
- Languages
- Media formats
- Other metadata
- author: Bernstein, Seth
- publisher: Programming Historian
- OER type
- Metadata and online reference
Submitted by
Fernando Martínez de Carnero
23/11/2015
in the project Strumenti e tecnologie per insegnare le lingue
last updated 24/11/2015
- Evaluations
- No evaluation
Please log in to add evaluation.
No comments yet.
Please log in to leave a comment.