The Proof of Concept

The Sensor

The system utilized a single sensor, the Stereolabs ZED stereo camera system. This system provides a color image and is capable of delivering a point cloud up to 25 meters. Given the constraints of maximum vehicle speed and the desired response time, this was sufficient for the task. By using a single sensor, the setup was kept very simple as there was no cross-calibration or synchronization required. The sensor also performed very well, returning an impressively dense point cloud at 1080p and 30fps.

Processing the Images

As there was no need to accumulate data due to the stereo camera's fairly dense point cloud, the system operated one-shot. This greatly simplifies system documentation like the operating diagram shown below. CPU compute is shown in blue and GPU compute is shown in green.

A flow chart showing the order in which the system progressed. In the middle the flow splits demonstrating a parallel processing steps. The flow reconnects at the end.

  1. Images are pulled from the camera by USB. This is performed by the API.
  2. The color image is passed to the GPU.
  3. The image's color space is converted from RGB to HSL for more accurate segmentation.
  4. The sampled color of a region is compared to the rest of the image to extract surfaces that match the sampled region.
  5. The stereo camera system does no on board processing, rather using CUDA via OpenCV for the heavy lifting. Conveniently, the camera's API allows access to the image pointer in the GPU's memory avoiding the expensive I/O to the CPU and back. This is performed by the API.
  6. The direction of the surface normals of a sampled region immediately in front of the vehicle is assumed to be flat with respect to the vehicle. If the rest of the surface normals are within a certain angle of the sampled area, they are considered flat as well.
  7. The two eligible areas are then combined with a logical AND. Filtering is then performed to fill in gaps in the largest blob as well as reduce the number of vertices around the edges.
  8. The edges are extracted from the main blob by removing proportional sections at the top and bottom that don't matter as they are so far that they will not be reached for another second or will be reached by the time the frame is computed, respectively.

This whole system operated at 30fps with 150ms latency from input frame to extracted edges. The system ran on a laptop with an Intel i7-6820HQ and Nvidia M1000M (2GB) using Ubuntu 16.04 LTS. The laptop showed no more than 25% CPU usage while simultaneously recording data to file and displaying the input and final processed images in real time. The cooling fan did not noticeably increase above idle, seemingly implying that the graphics compute load was similarly low for its capabilities.

Room for Improvement

As with any proof of concept, there's a lot to change for the prototype. Since this was pretty much just a pair of cameras and a computer, practically all the complexity is in the computation. There are two haves to the computation though, the image computation as performed by the camera's API and the edge extraction.

The camera produced a surprisingly dense point cloud. Having worked on and with stereo camera vision systems in the past and being somewhat familiar with the state of the art, its performance was impressive and intriguing. While performing some testing indoors, a hint as to how was provided when some strange behavior regarding depth readings on repetitive fabric patterns occurred. On a particular blue an white chevron pattern on a pillow from 10 feet, the camera placed the white part of the pillow at the correct distance, but placed the blue part another 5 feet further.

After some further investigation, it appears the depth measurement was performed using a neural engine. This also explained some of the results where inconsistent depth readings occurred over a fairly uniform but highly textured surface. So while the depth readings were dense, they were not always accurate. A stunning example of this is below where the sampled space contained some erroneous surface normals leading the system to believe the sides of a hill and a wooded area to be flat (the area encircled in red).

A picture of a two lane cutting across the side of a small, shallow hill such that the hill continues up to the right and down to the left to a wooded area and the road continues to a gentle curve to the left that disappears behind the wooded area to the left at around 80 feet. The red contour line extends well right of the road up the hill almost to a rock wall where the slope of the hill is sufficiently steep that it would not be possible to drive at speed. The red contour also continues well left of the road down a slope that would be similarly impossible to traverse in an on-road vehicle and encapsulates a large area of trees.

Due to the design of the system to also detect color differences, however, the system is able to largely differentiate the road surface from the non-road areas.

The same base image as above, however the erroneous areas to the left down the hill into the woods has been completely eliminated. The erroneous area to the light has been almost entirely eliminated except for a small area about 15 feet ahead of the vehicle and in a significantly flatter portion of the hill. Due to the small number of vertices in the erroneous area compared to the edge of the road surface, a regression line may largely eliminate that section.

Unfortunately, the neural engine meant there was no explainable vision pipeline, though this requirement may change in the future. Attempting to use more traditional, explainable methods did not yield the density or range required for the required vehicle speed. From the experiments performed, it seems unlikely any passive camera system can handle the low texture of fresh snow or fresh asphalt. This seems to imply that for the foreseeable future an active ranging system will be required to get any level of range and reliability to distance measurements. With that in mind, work began on the next iteration.


Dependencies


© 2021 Unmarked Navigation
Inquiries