Keyword-based pronunciation lexicons for speech synthesis

submit enquiry
software code
software code

University: University of Edinburgh

Sector(s): Information & Communications Technologies

About Opportunity:

This high-quality keyword-based pronunciation lexicon (Combilex) is easily adaptable for use in text-to-speech synthesis (voice-building or run-time synthesis) and in speech recognition systems.

Combilex is available in two lexicons:

  • Received Pronunciation of English (Combilex RP)
  • General American (Combilex GA)

Each lexicon contains c. 145,000 entries, including the 20,000 most frequent words and contains a variety of linguistic information alongside detailed pronunciations including many useful proper names.

The system is implemented as a database, allowing compact representations of word-forms, their morphological derivations, compounds and cross-references.

Combilex is an ASCII text file, one entry-per-line. Full manually notated orthographic-phonemic correspondences are included, allowing derivation of accurate grapheme-to-phoneme rules. It contains a rich specification for each word, covering pronunciation (and variants), Penn Treebank style part-of-speech tags, morphological boundaries, full correspondence between orthography and pronunciation, and semantic information where available.

Combilex was produced by the Centre for Speech Technology Research over three years and featured 100% manual quality control by expert lexicographers.


Richmond K, Clark R, Fitt S. (2009). Robust LTS rules with the Combilex speech technology lexicon. Interspeech 2009, Brighton.

Key Benefits:

  • Robust LTS rules with better than 86% accuracy
  • Accent-independent lexicon
  • Transcriptions include a phonemic-orthographic link and developing letter-to-sound rules for out-of-vocabulary words
  • The transcriptions use a meta-symbol set, which may be converted by rule into appropriate forms for various accents


  • Speech synthesis

IP Status:

The University of Edinburgh is seeking interest from commercial organisations to license this technology on a non-exclusive basis for internal product development purposes within all identified applications for speech synthesis.

Download this software today!

The Combilex RP and Combilex GA lexicons can be licensed online under a standard non-exclusive commercial licence agreement (with £2,500 signing fee) through Edinburgh Research and Innovation's Click-thru Licensing System.

This end-to-end licensing system streamlines the process, allowing you to arrange online payment and download the software within minutes.


Please enter your name.

Please enter the name of the company you work for.

Is your company an SME?

Please select the country you are in.

Please select what part of the UK you are in.

Please select from the following options.

Please enter your email address.

To help us process your enquiry faster please enter some details about the information you are interested in.

Words remaining:

Related Opportunities

  • Ultra fast data transfer devices

    This technology from the University of Glasgow offers a new and cheaper way to achieve connectivity of more than 1000 times broadband speed.

    Expires: Created:
  • Terrier - Terabyte Retriever

    Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents.

    Expires: Created:
  • Data Encryption

    Our low-cost data encryption method could be used in telecommunications channels where secure encryption of the data is essential i.e. in data channels where 'man-in-the-middle attacks' are prevalent.

    Expires: Created:
  • JournalTOCs – The Table of Content Search Engine for Journals

    JournalTOCs is the largest, searchable, freely available collection of scholarly journal 'Tables of Contents' in the world. It's for anyone looking for the latest papers published in International scholarly literature. A free service, customized versions with added functionality are available for li...

    Expires: Created:
  • Combined light communication and energy harvesting from solar panel receivers

    A technology allowing solar panels to harvest light energy while also acting as the receiver elements of light-based communication links or networks. These self-powered nodes allow networks of communicating devices to be assembled within the Internet of Things, Sensor Network applications, and in po...

    Expires: Created:

Alerts Signup

Sign up to our technology alerts and be the first to hear about any new technology opportunities from Scotland's universities

Register Now for University Alerts

Search Filter

Share this page

Use the buttons below to share these technology opportunities on your favourite social networking site. You can also share them from inside the individual opportunity.