Much of the programming for eSpeak's languages was based on information found on Wikipedia, with some subsequent feedback from native speakers.
Projects using eSpeak include NVDA, Ubuntu and OLPC, and it has also been used by Google Translate.
eSpeak is derived from the "Speak" speech synthesizer for British English for Acorn RISC OS computers which was originally written in 1995.
A rewritten version for Linux appeared in February 2006 and a Windows SAPI 5 version in January 2007. Subsequent development has added and improved support for additional languages.
Because of its small size and many languages, it is included as the default speech synthesizer in the NVDA open source screen reader for Windows, and on the Ubuntu and other Linux installation discs.
The quality of the language voices varies greatly. Some have had more work or feedback from native speakers than others. Most of the people who have helped to improve the various languages are blind users of text-to-speech.
eSpeak provides two methods of synthesis: the original eSpeak synthesizer and a Klatt synthesizer.
In addition, eSpeak can be used as a front-end, providing text-to-phoneme translation and prosody, to MBROLA diphone voices.
The eSpeak and Klatt synthesizers use different types of formant synthesis.
The eSpeak synthesizer creates voiced speech sounds such as vowels and sonorant consonants by adding together sine waves to make the formant peaks. Unvoiced consonants such as /s/ are made by playing recorded sounds. Voiced consonants such as /z/ are made by mixing a synthesized voiced sound with a recorded unvoiced sound.
The Klatt synthesizer mostly uses the same formant data as the eSpeak synthesizer. It produces voiced sounds by starting with a waveform which is rich in harmonics (simulating the vibration of the vocal cords) and then applying digital filters in order to produce speech sounds.
- eSpeak can be used as a command-line program, or as a shared library.
- It supports Speech Synthesis Markup Language (SSML).
- Language voices are identified by the language's ISO 639-1 code. They can be modified by "voice variants". These are text files which can change characteristics such as pitch range, add effects such as echo, whisper and croaky voice, or make systematic adjustments to formant frequencies to change the sound of the voice. For example, "af" is the Afrikaans voice. "af+f2" is the Afrikaans voice modified with the "f2" voice variant which changes the formants and the pitch range to give a female sound.
- eSpeak uses an ASCII representation of phoneme names which is loosely based on the Kirshenbaum system.
- Phonetic representations can be included within text input by including them within double square-brackets. For example: espeak -v en "Hello [[w3:ld]]" will say "Hello world" in English.
If you liked this article, subscribe to the feed by clicking the image below to keep informed about new contents of the blog: