Practically everything. The introduction of LIDAR is a large deviation from the old system's operation. The stereo camera is not longer functional, so a single color camera is being used. Stereo depth might be used to fill in gaps in the pointcloud if necessary.
As mentioned in the assessment of the previous system, the distance measurements were neither as accurate or precise enough for this system. So, like many of the large players in the autonomous vehicles space, active ranging methods are being introduced. As of early 2020, this means LIDAR. While LIDAR is a fantastic technology, there are a some drawbacks to the industry's current implementations, further elaborated upon in this analysis.
As this project has been entirely self funded, acquiring a high-end LIDAR system is not feasible even if the opportunity to acquire one even arises. In early 2019 a Benewake CE30-D was acquired. This sensor appears to be the first mass produced 3D solid-state LIDAR and was available for public purchase. After trials with this unit, the severely reduced range of LIDAR on asphalt pavement was discovered. The sensor went down to around 10m of range and given that the system required at least 20m of distance measurements to calculate the roadway flatness sufficiently far in advance at 40mph, a longer range unit is required.
Judging from my experimental results and the spec sheets of many LIDAR units, a unit with at least 60m of range at 80% reflectivity would be required to achieve 20m of range at 10% reflectivity. To use it on fresh pavement with a 5% reflectivity the range at 80% reflectivity would likely need to be at least to 80m. Other than a used Velodyne VLP-16 with a range of about 100m @ 80% reflectivity, the only other readily available sensors with sufficient range appear to be from Livox.
With a max listed range of at least 260m (seemingly at 80% reflectivity). The specifications are impressive, but with a non-repeating patterns it means the surface might not be scanned as densely as the vehicle moves as a static scanline would. The non-repetitive pattern also makes some of the localization algorithms more difficult to compute. While not technically a solid-state LIDAR (the non-repetitive pattern is created by spinning prisms) there is no spinning electronics making the units significantly cheaper to produce and more reliable than traditional spinning lidar systems.
Due to its performance and provided software suite including a SLAM implementation that can operate up to 80 kph, a Livox Horizon was chosen as the primary LIDAR for this system.
Image processing is practically the only thing that remains from the proof of concept. Even then, only the color image half of it survived. The operating principle is still sound, but a significantly deeper dive into color spaces and the perception of color has been done so there will be changes regarding the color sampling and comparison algorithms (a report on this will be forthcoming after some trials and more sources are gathered).
As previously mentioned stereo depth is being considered to fill in gaps in LIDAR measurements if required. Visual-Inertial SLAM is also being considered as it remains to be seen if a largely uniform surface like a roadway and the small amount of roadside seen by the short range LIDAR will be enough to localize on.
With the limitations of LIDAR, the system is no longer capable of being one-shot. As multiple measurement will be aggregated over time as the vehicle moves, a vehicle position state estimator has to be used to correct the scans. This also means an IMU to measure this change needed to be introduced. Fortunately, the Livox Horizon LIDAR contains an IMU so an additional component is not required. Hopefully, the sole LIDAR will produce a sufficiently dense pointcloud that a second LIDAR is not needed. If required, this second LIDAR would be the Benewake unit for to increase short range density for higher confidence line detection.
A single color camera will be used for the color analysis. A couple of mono cameras are also being considered if pointcloud density is still too low after the second LIDAR. As the primary LIDAR is a continuous scan device, device synchronization is not necessary. Data acquisition can then be done simply with one thread per sensor running at 30 Hz. At the current speed constraint of 40mph, the vehicle will travel a maximum of a little under 0.6 meters in one 30Hz loop. At the hopeful speed of 60mph, the maximum distance in one 30Hz loop is a little less than 0.9 meters. Over that short period of time, the IMU shouldn't have drifted so far as to impede localization from the LIDAR point correction.
The OBDII standard provides commands to access the vehicle's velocity in real time through the diagnostic port. This value is reported in kph as an 8-bit number. Given a 0 to 60mph time of 3 seconds (supercar territory) and using a constant acceleration model, after one 30Hz loop, the vehicle will have only progressed to a little over 0.7 kph, assuming no sensor latency. So if the localization can perform at 30Hz, there will be no need for velocity correction. If the localization requires more time, this vehicle velocity correction will be considered.
As discussed in the project introduction, it is reasonable to consider that a safety certifiable solution is not possible to achieve. In this situation, leveraging the effectiveness of contemporary AI and ML techniques as seen by Tesla's ADAS systems would likely be the most effective solution. This would likely be a two step system, one to detect the road surface from the raw inputs and one to detect the lanes from the road surface. It is possible to use a one step system, but for future extensions of the system it would likely be more beneficial to have two steps.
To get the road surface, pointcloud would be projected to the frame of the camera such that it matches the field of view and resolution of the image. This would then be fed into a deep convolutional neural network as additional channels to the image. This network would result in a semantic segmentation of the image into road surface and not road surface as a single channel output image with field of view and resolution matching the input image. In the future, more channels could be added for obstacles like other vehicles and pedestrians.
To get the lanes, the output image from the previous neural network and desired direction information are passed into another deep convolutional neural network. The desired direction information is encoded into an image by highlighting the left, center, or right third of a blank image to denote turning left, going straight, or turning right, respectively. This system obviously breaks down if the intersection is more than four way, but instances of this for unmarked roads is extremely rare. The network will then output an image with the same resolution as the input with four channels, one for each of the left side of the road, the center line of the road, the right side of the road, and the recommended line of travel.
As training these networks is a form of supervised learning, this will require a lot of data. If new sensors are added, it will require that data to be gathered again with the new sensors involved. Being a task that would be prohibitive to outsource, the sensors should be finalized first. Therefore, this will occur once the system has been shown to be effective with previously investigated methods.
As of early 2021 the following is completed:
Thanks to the software suite with the new LIDAR, the following items should be completed and previous work can be discarded:
Upcoming Work: