Waymo open-sources its self-driving data
Waymo launched Waymo One, its commercial driverless taxi fleet of over 600 cars six months back with safety drivers behind the wheel. According to the company “the operation has grown to serve over 1,000 riders in that time”. Waymo also revealed that its cars have driven 10 billion miles autonomously in simulation and 10 million real-world miles autonomously in 25 cities. Waymo also partnered with Lyft to deploy 10 of its vehicles on the ride-hailing platform in Phoenix.
Waymo has competition in Yandex, Tesla, Zoox, Aptiv, May Mobility, Pronto.ai, Aurora, Nuro, and GM’s Cruise Automation, to name just a few.
Waymo isn’t the first to open source it’s data: Lyft, Argo AI, and other firms have already open-sourced some data sets. However, Waymo’s move is notable because of its vehicles millions of miles coverage on roads already.
The Waymo Dataset contains data collected over the course of the millions of miles it’s cars have driven in Phoenix, Kirkland, Mountain View, and San Francisco. The data covers a wide variety of urban and suburban environments during day and night, dawn and dusk, and sunshine and rain. The samples are divided into 1,000 driving segments, each of which captures 20 seconds of continuous driving corresponding to 200,000 frames at 10 Hz through the sensors affixed to every car. These include five custom-designed lidars and front- and side-facing cameras.
The data also includes labeled lidar frames and images with vehicles, pedestrians, cyclists, and signage, capturing a total of 12 million 3D labels and 1.2 million 2D annotations. Waymo says the camera and lidar frames have been synchronized by its in-house 3D perception models that fuse data from multiple sources, obviating the need for manual alignment.
The Waymo’s enormous data set launch came after Lyft revealed its own open source dataset for autonomous vehicle development. The data contains 55,000 human-labeled 3D annotated frames of traffic agents, bitstreams from cameras and lidar sensors, a drivable surface map and an underlying HD spatial semantic map that includes over 4,000 lane segments, 197 crosswalks, 60 stop signs, 54 parking zones, eight speed bumps, and 11 speed humps.
Other such collections include Mapillary Vistas’ data set of street-level imagery, the KITTI collection for mobile robotics and autonomous driving research, and the Cityscapes data set developed and maintained by Daimler, the Max Planck Institute for Informatics, and the TU Darmstadt Visual Inference Group.
While Waymo deserves some credit for its move, it’s sharing just a tiny amount of the information it has gathered. Other companies are also hoarding data for competitive reasons, and they are especially reluctant to share information related to accidents and near-misses. But if the industry wants to overcome concerns about autonomous vehicles’ safety, the businesses in it will have to become far more transparent about what they’ve learned.