Enabling users to interact with their voice, have a conversation or take actions
This is a type of screenless UX, which can be used for delegating low-risk or reversible tasks. It requires low cognition, and enables multi-tasking PRO Max.
Most recently, there's a hidden behaviour shift taking place...
People are relying on voice more and more. Taking walks with ChatGPT to think out loud, or talking to the system with Whisprflow instead of click, click, type.
We're seeing voice being added to our day-to-day applications, not just standalone devices like Echo dot.
But why now? Because the technology is finally here. Our fear of "Uhuh, I didn't quite that. Try again" is a thing of the past. Audio interpretation has gotten faster and much more advanced - take Airpods' live translation for example.
In fact, it's so much better that users have their expectations raised again (since 2011's Siri launch); they expect accuracy, context awareness, and a seamless handoff between speaking, writing and editing. Done right, voice input can make a product feel like a real partner.
Let's dive into some key takeaways.
Users only believe in voice if they see immediate feedback and an accurate translation. Break that once, especially during onboarding or initial test, and you've lost that user for a couple of years.
Perplexity
has a voice input mode with a fancyyy visual feedback - reminiscent of the audio visualizer days.
A voice system is only as good as its editing flow. Punctuation, filler removal, and intent detection make it usable.
Alexa
listens for a wake phrase, "Hey Alexa", and initiates actions like playing music, checking the weather, or controlling your home devices. Users speak their commands, and while Alexa interprets it, the device shows a vibrant blue ring as feedback. The interaction is hands-free and optimized for ambient environments like kitchens or living rooms.
Whisprflow
corrects grammar and removes the ermss and umms. Everything that existed visibly in text, should now be present invisibly in voice.
We’re past the “dedicated device” era. Voice works best when it’s baked into daily tools (calls, docs, chats) instead of living on a smart speaker island. Embedding it reduces friction and raises adoption.
Siri
is activated via long press or voice, "Hey Siri". Users can ask general questions like "What's the weather this week?" or request context-aware actions like "Add hiking to my calendar for 7 AM". The assistant integrates with apps, pulling relevant information and executing tasks while preserving continuity across follow-ups.
Subtle guidance (“Say ‘next step’”) empowers without overwhelming. Voice isn’t just about commands anymore. Users are using it to reason, brainstorm, and multitask — design for messy, exploratory speech, not just short queries.
Arc
(highlight elevator music) enables users to speak during a phone call and have the assistant surface web results live. The AI listens mid-call, processes the query, and presents a summary or options without leaving the interface. This is an embedded voice interaction that functions within a live context, augmenting search with real-time voice input.
When designing a voice AI UX, the interface is minimal but needs to consider multiple states in terms of interaction and feedback.
The aim when working on a voice interaction is to get into code and a live experience as soon as possible to understand all the interaction states and iron out the nuances.
As a designer, there's not much to the interface, maybe a button and a visualizer, but a lot of designing is needed for the part of the interaction that's invisible.
Voice interfaces are soon going to evolve from simple, reactive tools into proactive, conversational systems. With advancements in memory and context modeling, future assistants will anticipate user needs and suggest helpful actions without being prompted.
Imagine this: your room is connected to a smart voice assistant that says, "It's getting quite warm; should I turn on the AC? Also, it's late; would you like me to dim the lights?"
for designers and product teams in the new AI paradigm.