This high-quality keyword-based pronunciation lexicon (Combilex) is easily adaptable for use in text-to-speech synthesis (voice-building or run-time synthesis) and in speech recognition systems.
Combilex is available in two lexicons:
- Received Pronunciation of English (Combilex RP)
- General American (Combilex GA)
Each lexicon contains c. 145,000 entries, including the 20,000 most frequent words and contains a variety of linguistic information alongside detailed pronunciations including many useful proper names.
The system is implemented as a database, allowing compact representations of word-forms, their morphological derivations, compounds and cross-references.
Combilex is an ASCII text file, one entry-per-line. Full manually notated orthographic-phonemic correspondences are included, allowing derivation of accurate grapheme-to-phoneme rules. It contains a rich specification for each word, covering pronunciation (and variants), Penn Treebank style part-of-speech tags, morphological boundaries, full correspondence between orthography and pronunciation, and semantic information where available.
Combilex was produced by the Centre for Speech Technology Research over three years and featured 100% manual quality control by expert lexicographers.
Richmond K, Clark R, Fitt S. (2009). Robust LTS rules with the Combilex speech technology lexicon. Interspeech 2009, Brighton.
The University of Edinburgh is seeking interest from commercial organisations to license this technology on a non-exclusive basis for internal product development purposes within all identified applications for speech synthesis.
Download this software today!
The Combilex RP and Combilex GA lexicons can be licensed online under a standard non-exclusive commercial licence agreement (with £2,500 signing fee) through Edinburgh Research and Innovation's Click-thru Licensing System.
This end-to-end licensing system streamlines the process, allowing you to arrange online payment and download the software within minutes.