Chatbots such as ChatGPT and voice assistants like Siri and Alexa have become ubiquitous parts of our daily lives. Millions of people regularly converse with these AI-powered tools to get information, set reminders, control smart devices, and more. However, while chatbots and virtual assistants represent an important early milestone in the development of artificial intelligence, they remain limited in significant ways.
These conversational agents can only engage in simple, narrow tasks operating in an isolated, disembodied way, devoid of richer context and unable to leverage external data sources.Moreover, most common AI tools only utilize rigid text or voice-only interfaces, restricting more natural interaction. The true promise of AI lies in building assistants that can seamlessly interact with us and other applications in an integrated way to accomplish complex, multi-step goals.
The Vision for Interactive AI
In an interview, Mustafa Suleyman, the co-founder of DeepMind, shared his vision for the next generation of interactive AI. He argued that we need to evolve beyond text-only interfaces, giving AI the ability to interpret prompts, gather external information, and execute multi-step processes. This could lead to AI that feels more natural and intuitive to converse with.
Suleyman is right that the future lies in conversational agents that are deeply integrated with the digital world around them. Working alone, even the most advanced natural language AI cannot match the knowledge and capabilities of the collective digital landscape. By interconnecting AI assistants with application programming Interfaces (APIs), databases, and other software, we gain access to orders of magnitude more data and functionality.
Better Methods Needed for AI/App Integration
To achieve this vision, we need better methods for interfacing between AI and external applications. Most APIs today are designed for software, not conversational AI. We need to develop more natural language APIs that allow an AI agent to query a database or utilize a service through dialogue.
User interface design is another crucial piece of the puzzle. The ideal interactive AI assistant should be platform-agnostic, operating seamlessly across voice-only devices like smart speakers as well as smartphones, computers, and augmented reality headsets. Designing UI/UX flows that work across modalities is no small feat.
However, once achieved better UI designs will make interacting with AI tools feel more seamless and productive. In an office setting for example, an AI assistant could respond to questions by pulling up relevant data visualizations, producing knowledge graphs that can organize information from multiple sources, or even filling out forms and interfacing with other software as needed. The human brain processes images 60,000 times faster than text, so more flexible and sophisticated UIs would allow us to communicate with AI assistants with greater efficiency, enabling more intuitive and creative work.
Multi-Modal Interfaces Improve Accessibility
To truly democratize access to these tools AI needs to work seamlessly across modalities like voice, touchscreens, VR, etc. to improve accessibility. People with disabilities could particularly benefit from AI that fluidly adapts across modalities to everyone’s needs and capabilities.
Voice-only interfaces like Alexa exclude the deaf community. Visual interfaces exclude the blind. But conversational AI flexibly bridging voice, visuals, even sign language and haptics, could enable more inclusive human-computer interaction.
AI as an Intelligence Amplifier
The potential payoff of interactive AI is immense. By combining natural conversational abilities with more sophisticated UI and the capacity to directly manipulate data, media, and digital systems, this technology can enable entirely new forms of augmenting human intelligence.
Rather than replacing human cognition, this positions AI as an amplifier – enhancing our problem-solving abilities in symbiotic ways. Humans provide oversight, creativity and strategic thinking. AI supplies computational power, access to vast knowledge, and the ability to carry out tasks.
For example, an AI assistant could help a scientist rapidly iterate through hypotheses by automatically designing and running experiments that test parameters verbally described by the researcher. The human provides ingenuity and high-level guidance, while the AI handles the heavy lifting.
By creating intuitive interfaces for humans to guide interactive AI systems, while connecting them to the massive troves of digitized data and media that exist today, we can solve problems and make discoveries that neither could alone. This vision foresees AI not as pre-scripted bots with limited capabilities, but rather as flexible collaborators that feel like extensions of our own minds.
Image Credit: Wayne Williams
Victor Botev is CTO and co-founder at Iris.ai.