Pairwise depth estimation
The PDE (Pairwise depth estimation) application uses the camera-captured 2D images to generate depthmaps and produce, for each camera, a stream of 2D images and a stream of depthmaps. See the vocabulary for an introduction to the terms used.
The process occurs on two Jetson TK1 devices (Master and Slave). Input to the process is two captured camera images - I1 on the master Jetson and I2 on the slave Jetson. The two cameras that recorded I1 and I2 have different views of the same scene.
The four main stages are Image cross-sharing, Depth estimation, Depth cross-sharing and Depth blending.
Information sharing between devices is shown as numbered arrows in the diagram, and hereafter called "transfer NR".
The master Jetson initiates the entire process by sending an instruction (transfer 0) to the slave Jetson. Transfer 0 is sent when the master receives a new I1. Immediately after sending transfer 0, the master sends a section (*1) of I1 to the slave, through the transfer 1 channel.
Transfer 0, when received by slave, causes slave to send a complementary section (*2) of I2 to the master, shown in the diagram as transfer 2.
*- note on sections:
- An image is separated into two 50% parts using a straight dividing line.
- Section size is 50% of an image + a safety padding of x%. The dividing line position can be left side + padding, top + padding, or any other half-selection of the image.
- The Complementary section is also 50% of an image + a safety padding of x%. However, the complementary section is chosen such that the complementary 50% + the first section's 50% = the complete image.
- The padding added to both sections is a border around the dividing line, with uniform thickness of
somepixels (x% of image resolution along an axis perpendicular to the dividing line).
As a result of transfer 1, the slave device has an overlapping section of I1 and I2, which can be used to estimate depth for one half of the scene. As a result of transfer 2, the master device has the complementary overlapping section of I2 and I1, used to estimate depth for the other half of the scene.
The depth estimation is currently a stub process. Depthmap output is simulated by producing monochrome-images from the input views, with same pixel data format as the actual depthmaps.
The Depth estimation may be bounded by rectification (for inputs) and de-rectification (for outputs).
The input to the depth estimation algorithm is two color images representing two different views. The output of this algorithm is two depthmaps, each depthmap matching one of the input views.
On the master device, the overlapping halves of I1 and I2 are sent into the depth estimation algorithm. The algorithm produces depthmaps D1(master) and D2(master).
On the slave device, the overlapping complementary halves of I1 and I2 are sent into the algorithm. The algorithm produces depthmaps D1(slave) and D2(slave).
At this stage, neither device has a full depthmap for any of the two views, however both devices have opposing halves of both view depthmaps.
Once the depth estimation algorithm has finished on the master device, the master device sends the estimated D2(master) depthmap half to the slave device. In diagram, this is shown as transfer 3.
Once the depth estimation has finished on slave, the slave sends the estimated D1(slave) to the master device. In diagram, shown as transfer 4.
When the master device sends off transfer 3 and receives transfer 4, it proceeds to the blending stage.
When the slave device sends off transfer 4 and receives transfer 3, it proceeds to the blending stage.
The depth blending is a simple process of reconstructing a full-view depthmap from two half-depthmaps of the same view. The two half-depthmaps have a section of partial overlap (safety padding) this is used to smoothly blend the depth information along the seam.
The master device uses D1(master) and D1(slave) to compose D1. D1 has the same resolution and view as I1.
The slave device uses D2(master) and D2(slave) to compose D2. D2 has the same resolution and view as I2.
Depth map format
Currently, the expected depth map format is a 2D matrix of depth values. Matrix rows and columns correspond to the image rows and columns. The depth values are 2-byte floats. The matrix can be passed into video/image encoders as a Mono16 monochrome image.
Depth value 16-bit float:
- There is no sign bit; all depth values are presumed positive.
- Exponent occupies the first (MSB) 4 bits.
- The significand takes the remaining 12 bits.
The info above shows the operation on a single pair of devices. The process scales to more devices by increasing the total number of pairs.
commit: $Id: pde.process-flow-diagram.md 660 2018-03-16 07:19:31Z elidim $