It was no coincidence.
OpenAI meticulously timed its Monday announcement — about the latest developments with its GPT-4o. We covered it in yesterday’s Outer Limits — Is Google Doomed?
It was the throwing down of a gauntlet. One-upmanship. And a clear message to the team at Google — “you’re way behind.”
OpenAI chose to make its announcement to preempt Google, which held its annual developer’s conference, Google I/O, just 24 hours later.
For anyone who is really interested in what Google is up to, you can find the two-hour keynote presentation here.
For those of us short on time, that’s what this issue of Outer Limits is about.
And it won’t take much guessing to figure out that the entire event was all about artificial intelligence (AI).
The latest developments at OpenAI, Google, X (Twitter), and even Meta may seem distant or even irrelevant to some of us, given the advanced technology being used.
But I assure you, these topics we’re exploring are going to dramatically impact our lives, whether we drive trucks, lay concrete, balance books, sell insurance, or work in healthcare.
What these companies are building are the foundational models upon which thousands of AI-powered applications will be built.
The technology is already being woven into products we use every day. It won’t be long before we don’t even think about it. “It” will just be there.
And because these companies have technology platforms that reach the majority of the world’s population, and the technology is software based, distribution and adoption will happen at light speed.
Not surprisingly, there were some striking similarities between OpenAI’s announcement and Google’s.
Both companies touted more efficient AI models that were faster, and also lower cost, in terms of compute.
This was a point that we explored yesterday. If these personalized AIs are too computationally intensive, it will be difficult to roll out the technology to the mass market. While venture capital firms often effectively subsidize a business model that loses money in order to drive growth (i.e. make it up on the back end), eventually the revenue generated needs to be larger than the costs of delivering that product or service.
In other words, mass rollout can’t happen until operational costs are within well-enough range…
And in the case of AI, those operational costs are all about the cost of computation. So whether the business model is driven by advertising (Alphabet/Google, Meta, Microsoft), subscriptions (OpenAI, Microsoft), or licensing (OpenAI), it has to generate free cash flow to fund future development.
That’s why these companies are working so hard to drive down costs. It will enable global deployments of this powerful technology to the mass market. And in the world of advertising, that’s where the big money is found, at massive scale.
In addition to speed, efficiency, and cost, Google also demonstrated the same kind of multi-input, conversational AI, and multi-model design as OpenAI.
While it was obvious that Google is indeed way behind OpenAI, and most of what it announced is still in the lab, it just doesn’t matter that much. Google has nearly unlimited financial resources… and arguably the most talented team of AI researchers in the world in the DeepMind team.
DeepMind’s CEO, Demis Hassabis, was placed in charge of Google’s AI research this March, in a major shakeup as a result of all the incredible work from him and his team in London, which we’ve been tracking in Outer Limits.
Assuming he can keep his team together and leverage Google’s resources, we can assume that Google will be able to catchup with OpenAI over the next 12-18 months.
There were also some key differences between Google’s and OpenAI’s announcements…
Where Google’s announcements differed from OpenAI were around the consumer-facing products of Google… and how AI is being incorporated into its own software services.
Take Google Photos for example…
Planned for launch this summer, Google is using its latest Gemini large language model (LLM) to power a new feature — Ask Photos.
Rather than needing to endlessly scroll through photos looking for that one moment, consumers will simply be able to direct the AI “find me that photo of me and Sara at Yosemite National Park in the fall when it was raining.” In a moment’s notice, the photos will appear.
As a reminder, Google already rifles through all of our photos, e-mails, web search, tracks where we are throughout the day with the GPS in our phones, and even listens to us speaking with one another or even talking to ourselves. That’s how the AI is capable of contextually understanding the people in our lives, our experiences, and how to do something like find that one photo amongst more than 10,000.
Google also made some announcements about new, generative AI features related to learning — LearnLM.
One of the most interesting functionalities of the LearnLM feature is the ability to query the AI when watching a lecture or a video:
While watching an educational video on YouTube, the user can ask specific questions about the subject matter, almost as if being in class, and as if having a private tutor.
A generative AI already has the contents of — and an understanding of — the lecture in its “memory,” so it’s capable of helpful instruction to the viewer with as much depth and context as desired.
And not surprisingly, Google is integrating Gemini right into its Google Workspace group of products, like Gmail, Docs, Sheets, and other productivity-focused software.
But of all the announcements, the one that was the most interesting was one that Demis Hassabis brought from DeepMind.
It’s the big vision for Google’s future — Project Astra.
In his announcement, Hassabis referred to it as a multi-modal universal assistant — a “Star Trek Communicator.”
Project Astra is clearly a work in progress, but it is on the same track as OpenAI is with its multi-modal AI. It uses voice, audio, video, and real-time camera inputs to understand its surroundings and provide utility to the user.
The demo of Project Astra showed the user interacting with a human-like AI, capable of understanding and synthesizing its surroundings. For those that would like to have a closer look, here is a short two-minute video of its current capabilities.
Shown above, part of the video demonstrated the multi-modal AI’s ability to read software code off of another screen, synthesize that code, and explain its purpose.
It’s an example of a major trend in AI right now to develop AI agents. We can think of these agents, for example our personal AI assistants, as something that can understand and respond to, and take action with, the real world.
This is where the utility comes into play.
And with the ability to “see” and “hear” the real world, comes another power that I doubt most of us have thought about.
These generative AIs have the ability to understand how we feel, what our mood is, what we’re troubled with, what we hate, and what we desire.
Just have a look at what the latest ChatGPT from OpenAI understood about its CEO’s expression below.
It’s worth reading the caption in full and comparing that to the expression on Altman’s face. It’s pretty wild.
As long as the AI can see us or hear us, it will have expert-level abilities to understand our facial expressions, tone, and emotions. As GPT-4o demonstrated yesterday, it has the ability to understand sarcasm. It even understands the nuance between a regular joke and a “Dad joke.”
Our companion AIs will have the ability to discuss our troubles with us, without the risk of feeling judged… after all, it’s just software right? We can’t hurt its feelings.
Our AIs will be able to comfort us when we’re upset, help us work through a problem at work, even brainstorm solutions to a tricky situation.
This is why we’ll develop emotional ties with “them.” And the more interaction they have with us, the more effective “they” will be at helping us.
There has been so much speculation around how this kind of technology will disrupt Google’s advertising revenues. Most believe that it will be bad for Google. But I’d like to offer a different view.
If Google understands our current state of mind, our desires, our mood at any moment, hears our thoughts, and speaks with us regularly throughout the day — which it arguably already does after “watching” us as we’ve moved about online and interacted with Google apps and technology… for over two decades…
Will it have more or less data that it can use for the purposes of generating advertising revenue? I think we all know the answer…
But okay, what about less screen time as a result of the voice interaction?
Our AIs will work with us and support us across multiple devices. So whether we’re on our browser on our desktop or laptop computer, of if we’re mobile and have our phone or tablet in our hands…
Or if we’re interacting directly with our personal AIs having a normal conversation…
Google knows where we are and which device we’re using. It will know our behaviors, which products we’re in the market for. And now it’ll have direct feedback on our innermost states of being.
Believe me, there are plenty of opportunities to present us that ad, specifically when we’re in an optimal mindset to purchase.
And during the short video on Project Astra, Google briefly demonstrated a new pair of smart glasses, which gave vision to the AI.
The glasses acted in the same way that a smartphone camera does, just hands free.
I remain bullish on the augmented reality (AR) eyewear space, as their utility is so obvious when we are working, walking, and interacting with the world around us. AR glasses free up our hands and allow us to interact more freely with less friction.
And just like the flow of water, the rate of adoption of a new technology increases proportionately to the reduction in friction it provides.
We always welcome your feedback. We read every email and address the most common comments and questions in the Friday AMA. Please write to us here.