What it takes for artificially synthesized speech to be suitable for gaming?

Discussion in 'Game Development (Technical)' started by Aharon Satt, Jan 17, 2018.

  1. Aharon Satt

    Aharon Satt New Member

    Joined:
    Jan 15, 2018
    Messages:
    3
    Likes Received:
    0
    In this post I would like to share opinions about Text To Speech (TTS) technology in the context of gaming. I am not a gaming expert – rather, my expertise is in signal and speech processing. I am working at the IBM Research division.

    I hope this post will trigger a discussion and opinion sharing. My purpose here is to discuss “quality”. I plan to discuss technology trends in follow on posts.

    The potential benefit of “good quality” TTS technology for game developers is clear. But it is still considered as delivering “insufficient quality”. What is “quality” anyway, in our context?

    1. The basic quality of modern TTS is good, in general. State of the art machine learning algorithms enable good prediction of the prosody (“intonation”, duration, loudness, emphasis and more) from the text, and the synthesized speech achieves good scores in subjective quality tests. This is no more the “robotic” sound it used to be. It sounds natural and “clean”. For applications such as announcements or commercial question answering, modern TTS provides a good alternative.

    2. Natural speech, however, needs correct emphasis of different words across the sentences. Due to the ambiguity of the natural language, the algorithms (and humans as well) cannot always determine “correctly” the emphasis from the text of isolated sentences or utterances, without full knowledge of the entire context. This can limit the quality achievable by modern TTS technology.

    3. When we consider gaming applications, additional needs arise. For example, using a formal style for generating the speech for a scene of action would sound weird – where are the emotions? The emotional content in human speech is essential for conveying messages. This is certainly an important aspect of “quality”.

    4. Yet another aspect that relates to “quality”, at least in the broader sense, is the variety of voices. Most modern TTS technologies are based on pre-recorded human voices (recording of voice talents uttering a large collection of sentences). As recording, and processing the recorded speech, are expensive and time consuming, the variety of voices in typical TTS products is limited, often a few and less commonly several tens of different voices per “major” language. Moreover, gaming often requires non-human voices, such as “cartoonish” ones, to best support the different characters. To summarize, using repeatedly the same voices across games and characters, amounts to less than optimal experience, or in other words – to lower “quality”.

    I hope this provides some initial insights – from the perspective of a speech technology researcher. I hope to get feedback from the gaming experts. Am I right? What have I missed? What would allow game developers to start benefiting from the TTS technology?

    All the best, Aharon.
     
  2. Aharon Satt

    Aharon Satt New Member

    Joined:
    Jan 15, 2018
    Messages:
    3
    Likes Received:
    0
    Hello,
    We would like to offer our beta-level service for free for about half a year from now, until Aug 10, 2018.

    We hope this tool demonstrates an answer for at least some of the challenges I listed above. We encourage you to read the help to get information about the new capabilities and explore them. We also hope that the use of this tool will trigger further technical discussion.

    I will post more details about the technology behind the scenes over the next weeks.

    Aharon
     

Share This Page

  • About Indie Gamer

    When the original Dexterity Forums closed in 2004, Indie Gamer was born and a diverse community has grown out of a passion for creating great games. Here you will find over 10 years of in-depth discussion on game design, the business of game development, and marketing/sales. Indie Gamer also provides a friendly place to meet up with other Developers, Artists, Composers and Writers.
  • Buy us a beer!

    Indie Gamer is delicately held together by a single poor bastard who thankfully gets help from various community volunteers. If you frequent this site or have found value in something you've learned here, help keep the site running by donating a few dollars (for beer of course)!

    Sure, I'll Buy You a Beer