Home / Technology / Microsoft brings new voice styles to Azure Cognitive Services

Microsoft brings new voice styles to Azure Cognitive Services

Microsoft today announced the launch of new neural text-to-speech (TTS) capabilities in Azure Cognitive Services, its suite of AI-imbued APIs and SDKs, that enable developers to tailor the voice of their apps and services to fit their brand. Each of three new styles — newscast, customer service, and digital assistant — offer fluid and natural-sounding speech that matches the patterns and intonations of human voices, allowing customers to deliver better, more memorable user experiences — in theory.

“Built on a powerful base model, our neural TTS voices are very natural, reliable, and expressive. Through transfer learning, the neural TTS model can learn different speaking styles from various speakers, enabling nuanced voices,” wrote Microsoft in a blog post.

The newscast voice reflects a “professional tone” you might hear on a TV or radio newscast, which is to say it contains no trace of regionalism and uses standard broadcasting pronunciation, a form of pronunciation in which no letters are dropped. In addition to Azure Cognitive Services, Microsoft says that the newscast-style voice is in the Microsoft Listening Docs for WeChat, which can read aloud Word, PowerPoint, and Excel documents and generate audio for online trainings, news podcasts, and more. It’s also in the Bing mobile app — when you search with the voice search feature, you’ll hear the news briefs using the newscast voice:

As for the customer service-style voice, it features a “friendly” and “engaging” tone that Microsoft says is tuned for scenarios involving customer support, like reporting a claim. By contrast, the digital assistant voice — which is available in two styles, a chat style for casual, conversational bots and a professional style for applications like in-car digital assistants — features a helpful tone that’s suited to relaying weather forecasts, navigation directions, reminders, and other such information.

VB TRansform 2020: The AI event for business leaders. San Francisco July 15 - 16

Beyond the voice styles optimized for specific scenarios, Microsoft this morning released several new emotion styles, which can be adjusted to express different emotions to fit a given context. There’s cheerfulness or empathy, and in Chinese, there’s lyrical, which Microsoft describes as “heartfelt” and optimized to read prose or poetry.

The new voice styles are available in English and Chinese while the emotion styles are available for English, Chinese, and Brazilian Portugese, though not all of the styles are available in all languages. Microsoft notes that the styles can be customized through the Custom Neural Voice feature within Microsoft Speech Studio, allowing brands to build unique voices that benefit from the new scenarios.

Microsoft is effectively going toe to toe with Google, which last year debuted 31 new AI-synthesized WaveNet voices and 24 new standard voices in its Cloud Text-to-Speech service (bringing the total number of WaveNet voices to 57). It has another rival in Amazon, which recently launched a service — Brand Voice — that taps AI to generate custom spokespeople, and which offers a number of voice styles and emotion styles through Amazon Polly, Amazon’s cloud offering that converts text into speech.

Let’s block ads! (Why?)

VentureBeat

About

Check Also

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

The scale of ambition for Saudi Arabia when it comes to moving into the games …