A new technology company, XDOF, has emerged from stealth mode, securing an impressive $70 million in funding to tackle a fundamental challenge in the burgeoning field of physical artificial intelligence (AI). The startup aims to provide the crucial data infrastructure needed to train robots to operate effectively in the real world, a gap that has hindered progress compared to the advancements seen in large language models (LLMs).
Major AI laboratories, including OpenAI, are increasingly shifting their focus back to robotics, recognising that the next significant leap in AI capabilities will involve machines interacting physically with their environment. However, unlike LLMs, which benefited from vast quantities of publicly available text data, robots require high-fidelity data capturing physical interactions, which is currently scarce and difficult to collect. Existing sources like YouTube videos often lack the precision and detail required for robust robotic training.
XDOF, co-founded by Philipp Wu, Fred Shentu, and Nemo Jin, believes this data bottleneck, rather than model architecture or processing power, is the primary hurdle for advanced robotics. Wu, whose PhD research at UC Berkeley highlighted the scarcity of large-scale datasets for robot learning, developed a low-cost teleoperation system called GELLO to address this very issue. This system allows human operators to control robotic arms, generating the detailed training data necessary for machine learning.
The company's strategy involves building comprehensive data pipelines, specialised collection tools, and sophisticated annotation systems that frontier AI labs and robotics firms would struggle to develop in-house. XDOF is already working with 20 customers, including several prominent AI research labs, though specific names have not been disclosed. The startup has also partnered with UC Berkeley's AI Research lab to release ABC, described as the largest collection of high-quality robot manipulation data ever assembled, comprising 130,000 trajectories and hundreds of hours of simulation and evaluation data.
This initiative could significantly accelerate the development of robots capable of performing complex physical tasks, from folding laundry to flattening boxes or precisely loading small components. The availability of such scaled-up pre-training data, previously unavailable to academia, is expected to foster unforeseen advancements within the robotics community, mirroring the rapid progress observed in language and image generation fields following similar data releases.
The company plans to operate across a 'data pyramid', prioritising highly valuable teleoperation data collected directly from deployed robots, followed by more general teleoperated data, and finally 'egocentric' data captured by humans performing everyday tasks. This multi-tiered approach aims to create a self-reinforcing feedback loop, continuously improving the data ecosystem for robot trainers.