Simulation is playing an increasingly major role in the development of safe and robust autonomous systems, especially given the advent of deep learning techniques. Given the challenges and effort involved with collecting data in real life, simulation provides an efficient alternative for gathering labeled training data for sensor observations, vehicle dynamics and environmental interactions. Furthermore, simulation allows extended evaluation through corner cases such as failures that would be inapplicable to a real life setup.
Over the last decade, simulations have gotten increasingly better at visual and physical fidelity. Game engines such as Unreal Engine and Unity provide several advanced graphical capabilities out of the box such as real time ray tracing, high resolution texture streaming, dynamic global illumination etc. Such game engines have also formed the base for several robotics and autonomous systems simulators such as AirSim(opens in new tab) and CARLA(opens in new tab), which allow users to deploy robotic platforms such as drones and cars equipped with cameras and other sensors in large 3D worlds.
While present simulations can generate high quality camera imagery, when it comes to non-visual classes of sensors, they often fall back upon simplified models. Complex sensors such as LiDAR, which lie at the heart of a majority of present day autonomous systems such as self-driving cars, are challenging to model given their dependence on aspects such as material properties of all the objects in an environment. Designing accurate LiDAR sensors in simulation often requires significant effort in handcrafting several environmental factors, and careful encoding of sensor characteristics for every new model. To alleviate this, we examine a new perspective on sensor modeling: one that involves learning sensor models from data. In our recent work “Learning to Simulate Realistic LiDARs”, we investigate how simulated LiDAR sensor models can be made more realistic using machine learning techniques.
Creating accurate sensor models for LiDARs is thus challenging due to the dependence of the output on complex properties such as material reflectance, ray incidence angle, and distance. For example, when laser rays encounter glass objects, the rays are refracted and they rarely return, a phenomenon known as raydrop. Basic LiDAR models that exist as part of robotics simulators often yield simple point clouds obtained by casting rays naively at every object, and do not account for such properties. Similarly, it takes significant effort to encode material properties of each object in a simulator, which makes it challenging to also estimate intensities of the LiDAR returns – most LiDAR models in simulations do not return valid intensities.
In this work, we introduce a pipeline for data-driven sensor simulation and apply it to LiDAR. The key idea we propose is that if we had access to data that contained both RGB imagery and LiDAR scans, we train a neural network to learn the relationship between appearance in RGB images, and scan properties such as raydrop and intensity in LiDAR scans. A model trained that way is able to estimate how a LiDAR scan would look, just from images alone, removing the need for complex physics-based modeling.
We focus on these two key aspects of realistic LiDAR data, namely raydrop and intensity. Given that current simulations already possess the ability to output distances, we assume that there already exists a sensor that returns a point cloud, which we then modify using our model in order to be more realistic. We name our model RINet (Raydrop and Intensity Network). At the input, RINet takes an RGB image, and attempts to predict the realistic LiDAR characteristics corresponding to that scene through a data structure we refer to as an intensity mask. This intensity mask is a densified representation of the LiDAR scan, and for each pixel in the RGB image, the intensity mask reports the closest intensity value from the LiDAR scan corresponding to the real world location being observed by that pixel. If a corresponding ray does not exist due to raydrop, the mask contains a zero. Once trained, our model works in tandem with an existing simulation such as CARLA. The RGB images from the simulator are passed through the trained RINet model, which results in an intensity mask prediction; and this intensity mask is then “applied” to the original LiDAR scan, resulting in an enhanced scan.
We train the RINet model on two real datasets: the Waymo Perception dataset(opens in new tab) and SemanticKITTI(opens in new tab) dataset, each resulting in a distinct LiDAR model:The Waymo dataset contains data from a proprietary LiDAR sensor, whereas SemanticKITTI uses the Velodyne VLP-32 sensor. RINet leverages the well known pix2pix(opens in new tab) architecture to go from an RGB frame to the intensity mask. We find that the RINet model is effective at learning material-specific raydrop (e.g.: dropping rays on materials like glass), as well as intensities (e.g.: learning that car license plates are metallic objects that result in high intensity returns).
RGB images and corresponding intensity masks from real data in the Waymo dataset. We can see noise in the LiDAR data, dropped rays at materials like glass, as well as varying intensity based on the objects.
Predictions from RINet for the same images – which demonstrate that the model is able to learn how to drop rays based on material observed from the image, and how to record intensities.
In order to validate our idea of enhancing existing simulators with our technique, we apply our model on top of LiDAR point clouds coming from the CARLA simulator. We can observe from the videos below that the performance is qualitatively better: as expected, rays are dropped on car windshields, while metallic objects such as vehicular surfaces, road signs etc. are more prominent in the intensity map.
We also investigate whether LiDAR-specific downstream tasks benefit from this realistic LiDAR alternative. To this end, we create a test pipeline involving the task of car segmentation from LiDAR point clouds. We train a segmentation model using the RangeNet++(opens in new tab) architecture on different versions of simulated data, and then apply the models trained solely on simulation to the real life Waymo dataset. We observe improved segmentation performance on real life data when using the RINet enhanced LiDAR scans, compared to the default CARLA point clouds. While CARLA does provide some means to simulate noise by randomly dropping points from the LiDAR scan, our method outperforms that version as well given the more realistic nature of the RINet outputs. Further analysis relating to this downstream task can be found in our paper.
Neural networks are powerful function approximators that have already shown impact in a vast array of fields, and specifically so in robotics and autonomous systems. Given the integral role that simulations are playing in robotics and deep learning, and the effort undertaken in building complex simulations such as sensor modeling, we are excited to present data-driven sensor simulations as a new paradigm. We show a pipeline where machine learning and traditional simulators can coexist in generating realistic sensor observations, and apply it to LiDAR sensors. Our framework is built in a way that it implicitly encodes sensor properties learning purely from observations, thus bypassing the need for expensive handcrafting of LiDAR models. In the future, we envisage similar efforts helping reduce the barrier in creating rich simulations for various kinds of sensors which, in turn, enable the creation of robust autonomous systems.
This work was a joint effort between Microsoft’s Autonomous Systems and Robotics Research group, Computer Vision Laboratory EPFL, Microsoft Research Redmond and the Microsoft Mixed Reality & AI lab at Zurich. The researchers who took part in this project are: Benoît Guillard, Sai Vemprala, Jayesh Gupta, Ondrej Miksik, Vibhav Vineet, Pascal Fua and Ashish Kapoor.