Envisioning Future Voice Interaction
Exploring user experiences with wake-word-free voice commands
Type: Work, Baidu App Voice Version
Time: Sep - Nov 2017
Role: Lead UX Designer
Overview
The future of human-computer interaction extends beyond graphical user interfaces and touch inputs. Multimodal interaction, integrating graphics and voice, is an essential area of exploration. At Baidu App, I explored and experimented with multimodal interaction possibilities for mobile devices.
User problem
Upon auditing the Baidu App, I found that a variety of audio content types, including news, books, music, and podcasts, were dispersed across different players. This setup fragmented the listening experience. Further analysis showed significant user overlap among these content types, suggesting the necessity of a unified player to enhance the listening experience.
User research revealed that many users listen to audio content while engaged in tasks that occupy their hands, known as busy-hand scenarios. In such scenarios, users require a complementary way to interact with the product. Introducing voice interaction to our listening experience is one potential solution.
Initial design proposal
I led a project exploring the integration of voice interaction into the Baidu App. The proposal attracted considerable interest from the Baidu App product line. In collaboration with the product team, we developed a product design showcase that captured executive interest and secured strong sponsorship for the project.
Highlights of the proposal:
Allow seamless listening to all audio content in the Baidu App via a unified audio player.
Enable intuitive and efficient voice-activated playback controls, searching, and easy content access.
User testing
This project introduces a new interactive mode to the Baidu App, prompting us to gather user feedback during product development. Our testing revealed two key insights:
Insight 1: Inefficiency of voice commands in playback controls
In standard Voice User Interface (VUI), a Wake-Up-Word (WUW), such as βHey Siriβ or βHey Google,β is necessary before issuing commands. For Baiduβs VUI, it's βXiaodu Xiaodu.β For example, the simple action of skipping to the next track. With touch, it's a quick single tap, but with voice, user have to say βXiaodu Xiaodu, next,β adding unnecessary complexity for four extra syllables. This comparison between the swift touch and the lengthier voice command underscores the inefficiency of voice-based commands for basic playback controls.
Insight 2: Context-awareness in voice queries
Users often engage with content through context-aware questions. Besides basic playback controls, they sometimes search for information related to the current news, such as unfamiliar terms or individuals mentioned in the article. This behavior indicates a need for context-awareness in the appβs search functionality, allowing users to delve deeper into topics of interest seamlessly.
Design iteration
User testing results led us to question the necessity of the Wake-Up-Word (WUW). We asked: Can its use be reduced, or even eliminated in certain cases? What are the criteria for requiring it? Guided by user feedback, we assessed action and information requests based on frequency, ease of touch use*, relevancy to the current context, and feasibility without WUW. This assessment helped us categorize requests into two types: those that can be executed without WUW and those that still require it.
*Ease of touch use refers to how simple and intuitive it is for users to perform certain actions using touch-based interactions in Graphical User Interface.
Introducing the Baidu App Voice Version
The product's voice interaction is more efficient and natural after optimization.
No Wake-Up-Word
For playback controls like pause, play, next, previous, and volume control, users can now say their commands directly without the WUW. They can also ask questions about the current news playing without the WUW, allowing for faster and easier access to supplemental information.
With Wake-Up-Word
For accessing information, content, and services not related to the ongoing audio, users should use the WUW followed by their request. This helps prevent misunderstandings and interruptions by distinguishing user intent. The WUW is essential for broader inquiries to avoid false response and ensure a smooth user experience.
Impact
The project was presented by Baidu founder and CEO Robin Li at Baidu World 2017, the company's most important annual conference. At the beginning of his presentation, Robin said:
βIn the future, no Wake-Up-Word is the way to natural voice interaction.β
The product was later launched in app stores in November 2017. Our project team was rewarded with two company awards:
2017 Baidu Outstanding Achievement Award
2017 Baidu Most Creative Award