Getting Started

Welcome to the Overtone documentation! In this section, we’ll walk you through the initial steps to start using the tools. We will explain the various features of Overtone, how to set it up, and provide guidance on using the different models for text to speech

Models

Overtone provides a versatile text-to-speech solution, supporting over 15 languages to cater to a diverse user base. It is important to note that the quality of each model varies, which in turn affects the voice output. Overtone offers four quality variations: X-LOW, LOW, MEDIUM, and HIGH, allowing users to choose the one that best fits their needs.

The plugin includes a default English-only model, called LibriTTS, which boasts a selection of more than 900 distinct voices, readily available for use. As lower quality models are faster to process, they are particularly well-suited for mobile devices, where speed and efficiency are crucial.

How to download models

The TTSVoice component provides a convenient interface for downloading the models with just a click. Alternatively you can open the window from Window > Overtone > Download Manager

Demos

The plugin contains a demos to demonstrate the functionality: Text to speech. You can input text, select a downloaded voice in the TTSVoice component an listen to it

Scripts

TTSEngine

This class loads and setups the model into memory. It should be added into scenes that Overtone is planned to be used. It exposes 1 method, Speak which receives a string and a TTSVoice and returns an audioclip.

public async Task<AudioClip> Speak(string text, TTSVoiceNative voice)

Example programatic usage:

string text = "Hello World!";
TTSVoiceNative voice = TTSVoiceNative.LoadVoiceFromResources("en-us-ryan-high");
AudioClip audioClip = await Engine.Speak(text, Voice.VoiceModel);
source.clip = audioClip;
source.loop = false;
source.Play();

voice.Dispose();

TTSVoice

This script loads a voice model and frees it when necessary. It also allows the user to select the speaker id to use in the voice model.

Script Reference for `TTSVoice.cs`

Property	Type	Description	Default Value
speakerId	int	The speaker id to be used	0
voiceName	string	The model to use	libritts

TTSPlayer

TTSPlayer.cs is a script that combines a TTSVoice and a TTSEngine into synthesized text.

Script Reference for `TTSPlayer.cs`

Property	Type	Description	Default Value
Engine	TTSEngine	The TTSEngine to use	null
Voice	TTSVoice	The voice model to use	null
Source	AudioSource	The source where to output the generated audio	null

SSMLPreprocessor

SSMLPreprocessor.cs is a static class that offers limited SSML (Speech Synthesis Markup Language) support for Overtone. Currently, this class supports preprocessing for the <break> tag.

Speech Synthesis Markup Language (SSML) is an XML-based markup language that provides a standard way to control various aspects of synthesized speech output, including pronunciation, volume, pitch, and speed.

While we plan to add partial SSML support in future updates, for now, the SSMLPreprocessor class only recognizes the <break> tag.

The <break> tag allows you to add a pause in the synthesized speech output.

Supported Platforms

Overtone supports the following platforms:

Platform	Supported
Windows	✅
Android	✅
iOS	✅
MacOS	✅
Linux	✅
WebGL	❌
Oculus	✅
HoloLens	❌

If interested in any other platforms, please reach out.

Supported Languages

Language	Best Quality	Number of Voices
Català (Espanya) (ca-es)	MEDIUM	3
Čeština (Česká Republika) (cs-cz)	MEDIUM	1
Dansk (Danmark) (da-dk)	MEDIUM	1
Deutsch (Deutschland) (de-de)	HIGH	16
Ελληνικά (Ελλάδα) (el-gr)	LOW	1
English (United Kingdom) (en-gb)	MEDIUM	131
English (United States) (en-us)	HIGH	958
Español (España) (es-es)	MEDIUM	6
Español (México) (es-mx)	MEDIUM	1
Suomi (Suomi) (fi-fi)	MEDIUM	2
Français (France) (fr-fr)	MEDIUM	6
Magyar (Magyarország) (hu-hu)	MEDIUM	2
Íslenska (Ísland) (is-is)	MEDIUM	4
Italiano (Italia) (it-it)	X-LOW	1
ქართული (საქართველო) (ka-ge)	MEDIUM	1
Қазақ Тілі (Қазақстан) (kk-kz)	HIGH	8
Lëtzebuergesch (Lëtzebuerg) (lb-lu)	MEDIUM	1
नेपाली (नेपाल) (ne-np)	MEDIUM	36
Nederlands (België) (nl-be)	MEDIUM	4
Nederlands (Nederland) (nl-nl)	LOW	2
Norsk (no-no)	MEDIUM	1
Polski (Polska) (pl-pl)	MEDIUM	3
Português (Brasil) (pt-br)	MEDIUM	2
Português (Portugal) (pt-pt)	MEDIUM	1
Română (România) (ro-ro)	MEDIUM	1
Русский (Россия) (ru-ru)	MEDIUM	4
Slovenčina (Slovensko) (sk-sk)	MEDIUM	1
Српски (sr-rs)	MEDIUM	2
Svenska (Sverige) (sv-se)	MEDIUM	1
Kiswahili (sw-cd)	MEDIUM	1
Türkçe (Türkiye) (tr-tr)	MEDIUM	2
Українська (Україна) (uk-ua)	MEDIUM	4
Tiếng Việt (Việt Nam) (vi-vn)	MEDIUM	67
中文 (中国) (zh-cn)	MEDIUM	2

Troubleshooting

For any questions, issues or feature requests don’t hesitate to email us at help@leastsquares.io or join the discord. Very are happy to help and aim to have very fast response times :)

Overtone - Realistic AI Offline Text to Speech (TTS)