Using voice commands with a smartphone is nothing new, but Samsung’s new digital assistant, Bixby, goes beyond voice recognition, to incorporate deep learning and expanded visual search to make it feel more like a real digital assistant living in your device.
Bixby draws immediate comparisons to Apple’s Siri, and Google’s Assistant, but while we often think about those as simply voices, Samsung describes Bixby has its own card-based visual interface to convey information. Voice is just one part of the equation. The other aspects are vision (like using the camera to scan a QR code, find out the cost of a book based on its cover, or translate text), reminders, and recommendations. Bixby is the umbrella term for those four smart functions.
Sriram Thodla, a senior director at Samsung focusing on intelligence and the internet of things, introduced Bixby to the public during the Galaxy S8 and S8+ announcement event on Wednesday. “Bixby understands context,” he said. “It knows what’s happening on your screen.”
For example, you can ask it to take a screenshot of what you’re doing, then send that image to a contact. This kind of complex request spanning multiple apps and services has proved problematic for digital assistants in the past.
Galaxy S8 and S8 Plus
Samsung’s new flagship phones, the Galaxy S8 and S8 Plus. The Bixby button is on the left side of the device.
“We say Bixby is an intelligent user interface,” Mok Oh, a vice president for services strategy at Samsung, said in an interview at a press event on Monday.
Oh touted Bixby’s completeness, meaning that if an app is Bixby-enabled, anything you can do with touch could also be done through voice. For example, you could ask Bixby to switch the display language on your phone to another language, and Bixby should make it so. The assistant is also “cognitively forgiving,” Oh said, so it should cope with ambiguity in requests.
Oh went on to highlight the phone’s photo app, called Gallery, and the thousands of different combinations of tasks a user could do within it. There are countless varied ways a user could command an image be cropped or edited, and Bixby should be able handle that.
“In many ways we apply deep learning technology,” to Bixby, Oh said. One aspect of that is that Bixby will give users a thumbs-up or thumbs-down option after it has handled a request, to let Bixby know how it did, and help it learn. “Actually, we apply learning in many, many different aspects of our whole technology stack for this,” he added.
That thumbs-up or thumbs-down function is critical for virtual agents like Bixby, Alex Rudnicky, a research professor of computer science at Carnegie Mellon University who focuses on speech, said. “You need some kind of a reinforcement that basically allows the system to learn—basically understand the connection between what the user wants, and what actually happens,” he said. “Realistically, the agent’s going to make a lot of mistakes.”
Amazon’s Alexa app has a similar function, asking the user if it did what they wanted.
In addition to its listening abilities, Bixby can also see into the real world. Using the S8’s built-in camera, Bixby can detect objects in a scene and search for information about that product as well as related products. Of course, it will also allow you to buy them from Samsung’s partners. This is a feature Siri doesn’t currently offer, and Google Assistant does, often to mixed results, but this type of augmented reality-style interaction is a logical step for AI as a personal assistant.
For the visual search, Samsung has tapped a variety of partner companies like Amazon for shopping, Foursquare for location-specific functions (Thodla used an example in which he took a picture of New York’s iconic Flatiron building and got information about it, as well as good food options in the area), and Google Translate for interpreting signs in different languages.