Forget chat. AI that can hear, see and click is already here

Exhibit A: Google’s NotebookLM. NotebookLM is a research tool the company launched with little fanfare a year ago. A few weeks ago, Google added an AI podcasting tool called Audio Overview to NotebookLM, which allows users to create podcasts about anything. Add a link to, for example, your LinkedIn profile, and the AI podcast hosts will boost your ego for nine minutes. The feature has become a surprise viral hit. I wrote about all the weird and amazing ways people are using it here. 

To give you a taste, I created a podcast of our 125th-anniversary magazine issue. The AI does a great job of picking some highlights from the magazine and giving you the gist of what they are about. Have a listen below. 

Multimodal generative content has also become markedly better in a very short time. In September 2022, I covered Meta’s first text-to-video model, Make-A-Video. Next to today’s technology, those videos look clunky and silly. Meta just announced its competitor to OpenAI’s Sora, called Movie Gen. The tool allows users to use text prompts to create custom videos and sounds, edit existing videos, and make images into videos.

The way we interact with AI systems is also changing, becoming less reliant on text. OpenAI’s new Canvas interface allows users to collaborate on projects with ChatGPT. Instead of relying on a traditional chat window, which requires users to do several rounds of prompting and regenerating text to get the desired result, Canvas allows people to select bits of text or code to edit. 

Even search is getting a multimodal upgrade. In addition to inserting ads into AI overviews, Google has rolled out a new feature where users can upload a video and use their voice to search for things. In a demo at Google I/O, the company showed how you can open the Google Lens app, take a video of fish swimming in an aquarium, and ask a question about them. Google’s Gemini model will then search the web and offer you an answer in the form of Google’s AI summary. 

What unites these features is a more interactive, customizable interface and the ability to apply AI tools to lots of different types of source material. NotebookLM was the first AI product in a while that brought me wonder and delight, partly because of how different, realistic, and unexpected the AI voices were. But the fact that NotebookLM’s Audio Overviews became a hit despite being a side feature hidden inside a bigger product just goes to show that AI developers don’t really know what they are doing. Hard to believe now, but ChatGPT itself was an unexpected hit for OpenAI.

We are a couple of years into the multibillion-dollar generative AI boom. The huge investment in AI has contributed to rapid improvement in the quality of the resulting content. But we’ve yet to see a killer app, and these new multimodal applications are a result of the immense pressure AI companies are under to make money and deliver. Tech companies are throwing different AI tools at people and seeing what sticks. 


Deeper Learning

AI-generated images can teach robots how to act