DI

Building Voice Interfaces People Actually Want to Use

Sean Johnson

Sean JohnsonPartner at DI and Founder Equity. Kellogg professor. Very pale.

Voice is increasingly becoming a preferred method of interacting with devices. It’s easy to see why. Voice-based interfaces are:

  • Fast. On average, users can speak 150 words per minute, but can only type 40.
  • Easy. Users don’t have to stop what they’re doing. Voice interfaces fit into our lives more seamlessly.

  • Personalized. Answers can be tailored based on the user’s location or context, and based on previous interactions.


Thanks to the proliferation of APIs and platforms currently available, voice interfaces are relatively easy to stand up. However, most interfaces don’t get used. They solve non-existent problems, or they do so in a way that is frustrating for users. The end result is a dismal level of retention and ongoing use for most skills.

DI has spent considerable time over the last year wrapping our heads around the implications of the voice revolution. Having built voice interfaces for telecommunication companies, meal delivery startups and more, DI has learned what works in the space and the pitfalls to avoid in crafting a usable voice interface.

Prototype early and often.

By far the most important piece of advice is to prototype early and often. Your voice interface will be much different than your web or mobile app.

You’ll likely discover that a step of your process is super clunky, or that additional steps are necessary to give the user the information they need. Plan for several iterations.

Avoid bad reviews.

Reviews play a big role the rankings and discoverability inside any platform, and the big voice platforms are no different. Even if you have a responsive cadence for providing feedback and fixing problems, it’s rare for customers to update reviews after the fact. Your best bet is to hit them off at the pass.

Set good expectations.

One great way to avoid unnecessary bad reviews is to tell your users what they will and will not be able to do with your skill. Put it in the description of the skill itself. Put it in Alexa Cards. Put it everywhere else you do promotion.

Identify edge cases.

Your users are going to behave in strange ways. Expect it. Create graceful error messages to guide users back on the right track. Consider how the app will respond when users provide all, some, or even none of the necessary information.

Be careful about wording.

Often phrasing that sounds good on paper doesn’t work when translated to voice. For example, if you ask “would you like X or Y”, don’t be surprised if users respond with “yes”. Focus on keeping your answers short and concise – Alexa speaks slower than a normal human would.

Think about brand.

Just because an interface doesn’t have any visual elements doesn’t mean it’s not an opportunity to build or reinforce your brand.

Ask yourself what your product or service’s personality is like. How would they talk to other people? Craft your copy to match.

Naming matters.

Don’t assume your users will understand how to pronounce the name of your skill. Don’t assume that it sounds the same regardless of accent. Test it. Also focus on what language consumers would use when trying to find your app. Don’t use jargon – think about them like keywords, because that’s what they are.

Make sure you have analytics set up.

Just because it’s a voice interface doesn’t mean you can’t capture data on how users interact (or don’t) with it. Study the most successful paths users take, and figure out how to funnel more of your users through those paths.

Obsess over “completed query percentage.”

By far the biggest predictor of retention is the percentage of queries successfully resolved on the first try. Users get frustrated with buggy voice interfaces much more quickly than other modalities.

Tracking and iterating to improve this percentage is a great way to boost retention, which can have a huge impact on growth. In fact, a 20% increase in retention results in nearly 4x the growth rate.

Add functionality.

While you want to constrain your application to a limited set of use cases, you’ll often find places where users go off the rails, or identify use cases you hadn’t thought about before. Adding this functionality can increase your completed query percentage, which increases satisfaction.

Keep user requirements simple.

Every additional piece of information you require reduces your completed query percentage. Figure out ways to make the interaction simpler.

Domino’s pizza, for example, has default order functionality inside of user accounts, allowing users to say “Alexa, order me a pizza.” and have it complete that task without asking for any additional information.

If a request does require multiple steps, make sure you keep the session “open” – customers get frustrated if they need to invoke your app for every request.

Use text to streamline onboarding.

There is not currently a fantastic way of linking user accounts with an Alexa skill. The best approach we’ve seen is to leverage text messaging.

Ask the user for their phone number (which, unlike email, has a high likelihood of being captured successfully) and send them a text message to verify their account and finish onboarding.

Place subtle triggers into your skills to drive retention.

One of the biggest issues with driving retention is the inability to send notifications – every interaction must be initiated by the user.

One sneaky way to train users can be to remind them at the end of a completed query to use it again. Something as simple at “here is your daily briefing”, or “check back tomorrow” can provide subtle, even subconscious cues to your users.

Leverage existing channels to drive adoption.

You’re not going to get enough traffic yet from the various app stores to drive significant volume, and paid ad networks don’t exist yet either. Your best bet is to leverage your existing assets – social, email, your website and others.

You probably need to augment off the shelf tools with your own.

While out of the box APIs can get you a long way, they’re often insufficient. For example, a product catalog probably needs to be backed by a fuzzy, full-text search, because people don’t say full product names, i.e. the full name of a Purely menu item.  Rather, they might say, “add the steak” to my cart.  They might also list multiple items in one breath.  Alexa will figure out what they are saying, but an API without fuzzy matching won’t find any matches for that specific menu title.

Start innovating with voice interfaces today.

It’s still early in the lifecycle of voice interfaces, and new patterns will emerge as customers and brands figure out what works and what doesn’t.

But being early to any platform usually creates a virtuous cycle. You learn faster than your competitors, you build an active install base, and you dig a moat. The friction inherent in getting an Alexa skill installed becomes an advantage once you’re on their device. If you can build a habit around your skill, they are highly unlikely to switch.

DI would love to talk with you about ways your customers could leverage voice to interact with your products or services. Don’t hesitate to reach out with questions or to discus how to bring voice interfaces to your organization.

Creating Voice Interfaces People Actually WANT To Use from Digital Intent

We host regular workshops and produce a monthly digest on innovation. Join the conversation.