A new wave of robotic intelligence, drawing parallels with the rapid advancements in Large Language Models (LLMs), is emerging from a San Francisco start-up, Physical Intelligence. Founded in 2024, the company is focused on creating a versatile 'brain' for robots, designed to learn and execute a multitude of tasks rather than being confined to a single function. This approach marks a significant departure from the specialised robots commonly seen in factories or those designed for specific household chores.
The core of Physical Intelligence's innovation lies in its development of 'vision-language-action' (VLA) models. These models are essentially the robotic equivalent of LLMs, which power popular AI chatbots. Instead of predicting the next word in a sentence, VLAs predict the next physical action a robot needs to take to complete a task, based on general instructions. This allows robots to interpret broad commands and translate them into specific, complex actions, enabling them to move from making coffee to folding laundry or peeling vegetables with relative ease, a feat that previously required extensive individual programming for each skill.
A significant hurdle in robotics has always been the sheer volume of data required to train machines for the near-infinite variations of real-world tasks. Physical Intelligence, however, is banking on its VLA-powered approach to drastically reduce this data dependency. The company is actively training its robots in diverse simulated environments, including mock supermarkets, bedrooms, and kitchens, which are frequently reconfigured. This varied training, coupled with testing in actual lived-in homes, helps the robots learn to generalise tasks and adapt to the unpredictable nature of real-world settings.
Sergey Levine, a co-founder from the University of California, Berkeley, highlights that solving more problems in AI can actually make learning easier by providing a richer and more diverse knowledge base. This principle underpins the company's strategy, allowing robots to learn from a broader spectrum of experiences. For instance, a recent model, π0.7, demonstrated the ability to cook sweet potatoes in an air fryer following verbal instructions, despite having no prior experience with the appliance, showcasing the power of generalisation.
The potential implications of such adaptable robotic intelligence are vast, extending beyond industrial automation. While the concept of general-purpose robotic intelligence has been a long-term goal for roboticists, the confluence of enhanced computing power, advanced algorithms, and data availability is now accelerating progress. This could pave the way for robots to become more deeply integrated into various sectors, from domestic assistance to complex logistical operations, offering flexibility and efficiency previously unattainable.
Experts like Ingmar Posner from the University of Oxford note that VLAs represent the most direct translation of the excitement surrounding large language models into the physical realm. The speed of progress observed by Levine in just two years of Physical Intelligence's operation suggests that the vision of robots seamlessly performing diverse tasks in everyday life might be closer than anticipated.