Humanizing Machine-Generated Speech

A UX Designer's Journey in the Audio News Market

 
 
 

Type: Work, Baidu App

Duration: 3 months, 2018 

Role: Project Design Lead / UX Designer

Background: coexistence of opportunities and challenges

Text-to-speech (TTS) technology converts text into speech. We know machines can generate speech; it's not groundbreaking. But when will machines sound human? As a Baidu UX designer, I am fortunate to dive into humanizing machine-generated speech in the audio news market segment.

In 2017, TTS technology was introduced to the Baidu App, allowing users to listen to the news and give their eyes a rest while freeing up their hands. However, the experience of machine-generated audio news was far from natural and human-like. While functional, the Audio News product was not enjoyable and was not widely adopted by most users. In 2018, TTS technology made a significant leap from concatenative to neural networks, resulting in more personalized and natural-sounding voices with reduced production costs and time. As the design lead of Baidu App's Audio News product vertical, I initiated projects to enhance the listening experience through content and sound design, capitalizing on this technological advancement.

User problems: machine-generated audio news sounds far from pleasing

I collaborated with the User Research team to understand how users perceived the current Audio News. We engaged regular audio content listeners and conducted interviews and surveys. Our findings revealed that:

User problem 1

Users are dissatisfied with machine-generated voices and desire more human-like audio shows.

We currently provide just two options: one male and one female, both considered serious and lacking personality. Many users find these voices unnatural and monotonous, and research shows that user preferences for voice traits vary widely.

Users with listening habits often use multiple apps, they prefer podcasts produced by real people over Baidu App's TTS news. To better understand what makes these podcasts better, I conducted competitive research and heuristic evaluation. I found that they usually have personalized brand images, focused content in specific subject areas, well-designed audio production, and hosts whose style aligns with the content. These qualities make them unique and memorable.

Therefore, we need to offer users better-quality, more personalized voice options, and it might be helpful to learn from radio shows or podcasts to create our 'AI News Podcasts' empowered by TTS.

User problem 2

Content is too lengthy to hold user attention.

Many users struggle to concentrate during our current TTS news broadcasts, often becoming distracted and feeling lost. They prefer shorter, more efficient content for a quicker understanding. Research on traditional news reveals the ideal length for a news brief is 1-3 minutes. In contrast, Baidu's articles range from 800-2,000 words, taking 4-10 minutes to read at a normal pace—far longer than what users are accustomed to for newscasts.

User problem 3

In busy-hand scenarios, users prefer uninterrupted listening. 

The current news feed is a chaotic mix of various topics, requiring users to search for what to play next, disrupting the experience, especially in busy-hand situations. Many users prefer content collections focused on specific subject areas. Therefore we should offer content collections with explicit and specific subject areas.

Design Goal

Based on our research findings, I led a design sprint with 2 other designers and identified high-level design goals to make Audio News more engaging and enjoyable, we aimed to improve:

  • Podcast humanization: Bring the quality and diversity of conventional podcasts to our machine-generated news podcasts.

  • Information retrieval efficiency: Enhance content conciseness to ensure users can quickly access key news content.

  • Continuous listening experience: Create a flow, interaction, and interface for a seamless listening experience.

Key Metrics

We collaborated with product managers and user researchers to determine key metrics:

  •  User Satifaction

  • Average Playback per User

  • Average Listens per User

Outcome: introducing the AI Podcasts

During the design sprint, numerous ideas were generated, and the top proposals were selected, prototyped, tested, evaluated, and iterated upon. Consequently, we introduced the new AI Podcasts.

Solution 1

Humanization and sound design

We launched three thematic AI Podcasts, each focusing on a popular news category. To enhance the human-like quality, we carefully selected a suitable voice for each category, branding them as virtual hosts. Each podcast comes with unique scripts and music, creating distinctive and memorable podcast shows.

We designed humanized scripts for each podcast, including introductions, segues, and outros. Introductions set the tone of the show, identify the podcast, introduce the host, and help create a recognizable brand image. Segues transition between content, while outros wrap up episodes, reinforce the brand image, and leave listeners eager for the next episode.

Solution 2

Improve information retrieval efficiency through news briefings

AI Podcasts default to broadcasting news briefings. We generate these briefings using the article abstracts provided by the authors, resulting in news briefs that range from 1 to 3.5 minutes in length—more in line with users' preferences. If users wish to listen to the full article, they can do so by clicking a button on the card.

Solution 3

Interface designed for continuous listing in busy-hand scenarios 

We transformed the interface layout from a list to large cards, enhancing the immersive listening experience. This streamlined interface reduces information noise and allows users to see the main content at a glance. To navigate to the next or previous news, users can swipe up or down, which is a gesture more suitable for busy-hand scenarios than tapping on list items. Enlarging the card also makes the progress bar and buttons easier to tap on.

Impact: metrics and beyond

In October 2018, the launch of AI Podcasts brought significant growth to our Audio News product within just one month. The average listens saw a remarkable growth of 107%, with 44% of news content being played from our three primary AI News Podcast programs. The average playback per person increased by 80.9%. Daily active users increased by 113.2%.

This design-led project was a success both in terms of data and impact. It offered fresh perspectives to the Audio News product team, leading to the introduction of more personalized podcast programs. Our technology supplier, the TTS technology team, appreciated and showcased this project in external client presentations. Within Baidu's design community, I presented this project, sharing my unique approach to tackling UX challenges, and received an Outstanding Achievement Award from the User Experience department.

My contributions 

  • Initiated the design project by identifying overlooked problems and opportunities, bridging the gap between technological innovation and user-centric end products.

  • Managed the project from kick-off to launch, leading a team of one visual designer and one interaction designer.

  • Led cross-functional and cross-organizational communication and collaboration, establishing shared goals among diverse stakeholders and involving them throughout the design process.

  • Prototyped innovative audio-focused experiences and rapidly tested assumptions with actual users.

Key skills I utilized: product strategy, project management, stakeholder management, roadmap planning, prioritization, UX research, interaction design, content design, sound design, defining metrics, and measuring impact through quantitative and qualitative data.

 

More projects