Google uses AI to teach a robot how to grasp and throw things
Robots with an intuitive understanding of physical laws might sound like something out of an Isaac Asimov novel, but scientists at Google’s robotics division say they’ve essentially created them. In doing so, they contend, they’ve potentially laid the groundwork for future systems capable of learning tosses, slides, spins, swings, catches, and other athletic feats which currently pose a challenge for even the most capable machines.
“Though considerable progress has been made in enabling robots to grasp objects efficiently, visually self adapt or even learn from real-world experiences, robotic operations still require careful consideration in how they pick up, handle, and place various objects — especially in unstructured settings,” wrote Google student researcher Andy Zeng in a blog post. “But instead of just tolerating dynamics, can robots learn to use them advantageously, developing an ‘intuition’ of physics that would allow them to complete tasks more efficiently?”
In an attempt to answer that question, Zeng and colleagues collaborated with researchers at Princeton, Columbia, and MIT to develop a picker robot they dubbed TossBot, which learns to grasp and throw objects into boxes outside the confines of its “natural range.” It’s not only twice as fast as previous state-of-the-art models, but achieves twice the effective placing range, and moreover can improve through self-supervision.
Throwing with predictability isn’t easy — even for humans. Grasp, pose, mass, air resistance, friction, aerodynamics, and countless other variables impact objects’ trajectories. Modeling projectile physics through trial and error is to an extent possible, but Zeng notes that it’d be computationally expensive, require a lot of time, and wouldn’t result in a particularly generalizable policy.
Instead, TossingBot uses a projectile ballistics model to estimate the velocity needed to get an object to a target location, and it uses end-to-end neural networks — layers of mathematical functions modeled after biological neurons — trained on visual and depth data from overhead cameras to predict adjustments on top of that estimate. Zeng says this hybrid approach enables the system to achieve throwing accuracies of 85 percent.
Teaching TossingBot to grasp objects is a bit trickier. It first attempts “bad” grasps repeatedly until it identifies better approaches, while concurrently improving its ability to throw by occasionally randomly throwing objects at velocities it hasn’t tried before. After 10,000 grasp and throw attempts over the course of about 14 hours, TossingBot can firmly grasp on an object in a cluttered pile about 87 percent of the time.
Perhaps more impressively, TossingBot can adapt to never-before-seen locations and objects like fake fruit, decorative items, and office objects after an hour or two of training with similar, geometrically simpler knickknacks. “TossingBot likely learns to rely more on geometric cues (e.g. shape) to learn grasping and throwing,” Zeng said. “These emerging features were learned implicitly from scratch without any explicit supervision beyond task-level grasping and throwing. Yet, they seem to be sufficient for enabling the system to distinguish between object categories (i.e., ping pong balls and marker pens).”
The researchers concede that TossingBot hasn’t been tested with fragile objects and that it uses strictly visual data as input, which might have impeded its ability to react to new objects in tests. But they say the basic conceit — combining physics and deep learning — is a promising direction for future work.