Transliterating non-ASCII characters with Python

view resource

This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardized format using the American Standard Code for Information Interchange (ASCII) characters. It builds on readers’ understanding of Python from the lessons “Viewing HTML Files,” “Working with Web Pages,” “From HTML to List of Words (part 1)” and “Intro to Beautiful Soup.” At the end of the lesson, we will use the transliteration dictionary to convert the names from a database of the Russian organization Memorial from Cyrillic into Latin characters. Although the example uses Cyrillic characters, the technique can be reproduced with other alphabets using Unicode.

Type of material
Terms of use
Target audience
Subject areas
Tags
Languages
Media formats
Other metadata
author: Bernstein, Seth
publisher: Programming Historian
OER type
Metadata and online reference

Submitted by Fernando Martínez de Carnero
23/11/2015
in the project Strumenti e tecnologie per insegnare le lingue

last updated 24/11/2015

Original editing language: unknown
Evaluations
No evaluation

Please log in to add evaluation.

Comments

No comments yet.

Please log in to leave a comment.