The first Amazon Echo, back in 2014, was presented as a device for a few simple things: playing music, asking basic questions, checking the weather. Since then, Amazon has found some new things for people to do, such as controlling smart home devices. But ten years later, Alexa is still mainly used to play music, ask basic questions, and check the weather. This is mainly because even though Amazon has made Alexa ubiquitous in devices and homes everywhere, it has never convinced developers to care about it.
Alexa was never intended to have an app store. Instead, it had “skills” that Amazon hoped developers would use to connect Alexa with new features and information. Developers shouldn’t create their own things on top of the operating system, they should create new things for Alexa to do. The difference is subtle but important. Our phones are mostly a series of unrelated services — Instagram is a universe completely separate from TikTok and Snapchat, calendar apps and Gmail. It just doesn’t work for Alexa or any other successful assistant. If he knows your to-do list but not your calendar, or your favorite type of pizza but not your credit card number, there’s not much he can do. To be able to perform tasks for you, it needs access to everything and all the necessary tools at its disposal.
The Verge examines how far the voice assistant has come in a decade: its successes, failures, and potential future.
In Amazon’s dream world, where “ambient computing” is perfect and everywhere, all you need to do is ask Alexa a question or give it a command: “Find me something fun to do this weekend.” “Book my train to New York next week.” “Speed me up on deep learning.” Alexa will have access to all the apps and information sources you need, but you’ll never have to worry about it; Alexa will just deal with it however she needs it and bring you the answers. There are a thousand complicated questions about how this actually works, but that’s still the most important idea.
“With Alexa Skills, developers could quickly and easily create voice-driven experiences, opening up a whole new way for developers and brands to interact with customers,” Amazon spokeswoman Jill Tornifoglio said in a statement. Customers use them billions of times a year, she said, and as the company implements generative artificial intelligence, “we can’t wait to see what comes next.”
In hindsight, Amazon’s idea was largely the right one. Over the years, OpenAI and other companies have also been trying to build their own third-party ecosystems around chatbots, which is another take on the idea of an interactive web interface. But for all its predictability of the AI revolution, Amazon has never figured out how to make the skills work. It never solved core developer problems, it never cracked the UI, and it never found a way to show people everything their Alexa device could do if they just asked.
In hindsight, Amazon’s idea was largely the right one
Amazon has certainly made every effort to make skills a reality. The company constantly provided developers with new tools, paid them with AWS credits and cash when their skills were used (though it recently stopped doing so), and tried to make skill development virtually hassle-free. And on some level, all that effort has paid off: Amazon says there are more than 160,000 skills available on the platform. This pales in comparison to the millions of apps available in smartphone app stores, but it’s still a big number.
However, the interface for finding and using all these skills has always been a mess. Let’s take one simple example: if you ask Alexa to order you a pizza, she might tell you she has some skills for it and recommend Domino’s. (If you’re wondering why Amazon would choose Domino’s over Pizza Hut, DoorDash, or any other pizza ordering service? Great question. I have no idea.) You answer yes. “This is Domino,” Alexa says. And a moment later: “This is the Domino’s Skill by Domino’s Pizza, LLC.” Another moment: “To connect your Domino’s Pizza profile, go to the Skills setting in the Alexa app. We will need your email address to place an order as a guest. Enable “Email address” permissions in the Alexa app.” At this point you need to find a hidden setting in an app that you may not even have on your phone; it would be much easier to just go to the Domino’s website. Or, hell, call there.
If you know the skill you’re looking for, the system is a little better. You can say “Alexa, open Nature Sounds” or “Alexa, open Jeopardy” and a skill with that name will open. But unless you remember that the skill is called “Easy Yoga,” asking Alexa to start a yoga workout won’t get you anywhere.
Alexa can do many things. Figuring out which one is the real challenge. Image: Amazon
There are few such friction points throughout the system. When you activate a skill, you must clearly say “stop” or “cancel” to discontinue it and use another one. You can’t easily combine different skills – I’d like to check the price of my pizza, but Alexa won’t let me. And perhaps most frustrating of all is that even after you enable a skill, you still have to specifically address it. Saying “Alexa, ask AnyList to add spaghetti to my shopping list” is not a seamless interaction with the all-knowing assistant; this means you have to learn an extremely specific computer language to be able to use it properly.
As it turns out, many of the most popular Alexa skills have two things in common: they are simple Q&A games, and they are made by Volley. From Song Quiz to Jeopardy to Who Wants to Be a Millionaire to Are You Smarter Than a 5th Grader, Volley is one company that has figured out how to make skills actually work. Max Child, co-founder and CEO of Volley, says presenting your skills to people is one of the most important – and most difficult – parts of the job.
“I think one of the underrated reasons for the success of the iOS and Android app stores is that Facebook ads are so good,” he says. The path from hypertargeted advertising to app installs has been ruthlessly refined over the years, and there’s no such thing with voice assistants. The closest equivalent is probably people asking their Alexa devices what they can do — which Child says is actually happening! — but with In-Feed ads and hours of social media scrolling, there’s simply no competition. “Because you don’t have this hyper-targeted marketing, you have to do broad marketing and create broad games.” Hence games like Jeopardy and Millionaire, which are huge franchises that appeal to just about everyone.
One of the ways Volley makes money is through subscriptions. For example, the full version of Jeopardy costs $12.99 per month, and like many other modern subscriptions, it’s much easier to subscribe than to cancel. It’s also one of the few ways to make money with skills: developers can show audio ads for certain skills or ask users to add credit card information directly like Domino’s does, but ask the voice user to pick up the phone first and go through the settings to do so high bar to clear. Advertising is only useful on a large scale – there was a brief moment when many media companies thought so-called “flash briefings” might be a hit, but it didn’t turn out to be much of a hit.
By the way, these are not unique challenges. Mobile app stores have similar massive discovery issues, monetization issues, insecure subscription systems, and more. It’s just that in the case of Alexa, the solution seemed so tempting: you shouldn’t and don’t even need the app store. You should just be able to ask for what you want and Alexa can do it for you.
In the case of Alexa, the solution seemed very tempting: you shouldn’t and don’t even need an app store
Ten years later, it seems that an all-powerful, all-powerful voice AI might just be impossible to create. If Amazon were to make everything so smooth and fast that you didn’t even need to know you were working with a third-party developer and your pizza magically appeared on your doorstep, it would raise huge privacy concerns and questions about how how Amazon selects them suppliers. If it asks you to select all default settings, it means that every new user is signing up to do an awful lot of busy work. If it allows developers to have and support even more features, it destroys the simplicity of the environment that makes Alexa so enticing in the first place. The problem is too much simplicity and abstraction.
However, we are at a turning point. Ten years after its launch, Alexa is changing in two key ways. One is good news for the future of skills, the other may be bad. The good thing is that Alexa is no longer a voice-only or even voice-only service — as Echo Show and Fire TV devices become more popular, more and more people are interacting with Alexa while having a screen nearby. This can solve many interaction problems and give developers new ways to present their skills to users. (Screens are also a great place to advertise your skills, as Amazon may know all too well). When Alexa can show you things, it can do so much more.
Already Child says most Volley players use a device with a screen. “We have been interested in smart TVs for a long time,” he says with a laugh. “Every Smart TV sold today has a microphone in the remote control. I really think that regular voice games… can make a lot of sense and be even more engaging.”
Amazon is also going to re-engineer Alexa around LLM, which could be the key to making this all work. A smarter, AI-powered Alexa could finally understand what you’re actually trying to do and get rid of the awkward syntax required to use skills. Can understand more complex questions and multi-step instructions and use skills on your behalf. “Developers now just need to describe the capabilities of their device,” Amazon’s Charlie French said at an Alexa AI launch event last year. “They don’t have to try to predict what the customer is going to say.” Amazon is just one of the companies that promise that LLM companies will be able to do things on your behalf without you having to do any extra work; Does skill even need to exist in this world, or will a model just figure out how to order a pizza?
There is some evidence that Amazon is lagging behind in its AI work, and that plugging in a language model won’t suddenly make Alexa awesome. (Even the best LLM managers feel like they are only a little close to being good enough to do these things). But even if that’s the case, it only makes the bigger question more important: what can virtual assistants actually do for us? And how do you ask them about it? The correct answers are “whatever you want” and “whatever you want.” It takes a lot of developers to bring new capabilities to Alexa. And that requires Amazon to provide them with a product and business worth the effort.
Credit : www.theverge.com