Microsoft brings new voice styles to Azure Cognitive Services

April 2, 2020 Technology Comments Off 283 Views

Microsoft today announced the launch of new neural text-to-speech (TTS) capabilities in Azure Cognitive Services, its suite of AI-imbued APIs and SDKs, that enable developers to tailor the voice of their apps and services to fit their brand. Each of three new styles — newscast, customer service, and digital assistant — offer fluid and natural-sounding speech that matches the patterns and intonations of human voices, allowing customers to deliver better, more memorable user experiences — in theory.

“Built on a powerful base model, our neural TTS voices are very natural, reliable, and expressive. Through transfer learning, the neural TTS model can learn different speaking styles from various speakers, enabling nuanced voices,” wrote Microsoft in a blog post.

The newscast voice reflects a “professional tone” you might hear on a TV or radio newscast, which is to say it contains no trace of regionalism and uses standard broadcasting pronunciation, a form of pronunciation in which no letters are dropped. In addition to Azure Cognitive Services, Microsoft says that the newscast-style voice is in the Microsoft Listening Docs for WeChat, which can read aloud Word, PowerPoint, and Excel documents and generate audio for online trainings, news podcasts, and more. It’s also in the Bing mobile app — when you search with the voice search feature, you’ll hear the news briefs using the newscast voice:

As for the customer service-style voice, it features a “friendly” and “engaging” tone that Microsoft says is tuned for scenarios involving customer support, like reporting a claim. By contrast, the digital assistant voice — which is available in two styles, a chat style for casual, conversational bots and a professional style for applications like in-car digital assistants — features a helpful tone that’s suited to relaying weather forecasts, navigation directions, reminders, and other such information.

Beyond the voice styles optimized for specific scenarios, Microsoft this morning released several new emotion styles, which can be adjusted to express different emotions to fit a given context. There’s cheerfulness or empathy, and in Chinese, there’s lyrical, which Microsoft describes as “heartfelt” and optimized to read prose or poetry.

The new voice styles are available in English and Chinese while the emotion styles are available for English, Chinese, and Brazilian Portugese, though not all of the styles are available in all languages. Microsoft notes that the styles can be customized through the Custom Neural Voice feature within Microsoft Speech Studio, allowing brands to build unique voices that benefit from the new scenarios.

Microsoft is effectively going toe to toe with Google, which last year debuted 31 new AI-synthesized WaveNet voices and 24 new standard voices in its Cloud Text-to-Speech service (bringing the total number of WaveNet voices to 57). It has another rival in Amazon, which recently launched a service — Brand Voice — that taps AI to generate custom spokespeople, and which offers a number of voice styles and emotion styles through Amazon Polly, Amazon’s cloud offering that converts text into speech.

Let’s block ads! (Why?)

VentureBeat

Web Wad

Microsoft brings new voice styles to Azure Cognitive Services

About

Related Articles

Check Also

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

How RapidCanvas automates 70% of data tasks for gen AI projects

10 Tree Shapes to Transform Your Yard

Unifying gen X, Y, Z and boomers: The overlooked secret to AI success

Tomato.ai launches zero-shot accent softening model to revolutionize call center industry

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

Could a Keto Diet Be Bad for Athletes’ Bones?

How to Invest in Real Estate to Achieve FIRE

Appeal Cosmetics New Products!

What Might Fasting Insulin Predict About Health?

8 Things I Always Buy at Thrift Stores

Could a Keto Diet Be Bad for Athletes’ Bones?

How to Invest in Real Estate to Achieve FIRE

Appeal Cosmetics New Products!

David Katzenstein, AIDS Researcher With Focus on Africa, Dies at 69

S’pore-Based Fintech Startup GoBear Lays Off 22 Staff Despite Recent US$17M Funding

Researchers propose AI for detecting fraudulent crowdfunding campaigns

How RapidCanvas automates 70% of data tasks for gen AI projects

10 Tree Shapes to Transform Your Yard

Unifying gen X, Y, Z and boomers: The overlooked secret to AI success