Converting a series of depth maps and x, y, z, theta values into a 3D model

Question

I have a quadrotor which flies around and knows its x, y, z positions and angular displacement along the x, y, z axis. It captures a constant stream of images which are converted into depth maps (we can estimate the distance between each pixel and the camera).

How can one program an algorithm which converts this information into a 3D model of the environment? That is, how can we generate a virtual 3D map from this information?

Example: below is a picture that illustrates what the quadrotor captures (top) and what the image is converted into to feed into a 3D mapping algorithm (bottom)

enter image description here

Let's suppose this image was taken from a camera with x, y, z coordinates (10, 5, 1) in some units and angular displacement of 90, 0, 0 degrees about the x, y, z axes. What I want to do is take a bunch of these photo-coordinate tuples and convert them into a single 3D map of the area.

Edit 1 on 7/30: One obvious solution is to use the angle of the quadrotor wrt to x, y, and z axes with the distance map to figure out the Cartesian coordinates of any obstructions with trig. I figure I could probably write an algorithm which uses this approach with a probabilistic method to make a crude 3D map, possibly vectorizing it to make it faster.

However, I would like to know if there is any fundamentally different and hopefully faster approach to solving this?

I have added the requested information. Could you please elaborate on the sentence on mapping to mesh/surface? — Shrey Joshi, Jul 21 '19 at 18:09
Thanks Spektre. 1 quick question: What is 'topology of input data'? Also, please convert your comments into an answer so I may accept it. — Shrey Joshi, Aug 01 '19 at 19:45

Spektre · Accepted Answer · 2019-08-04T14:23:58.840

Simply convert your data to Cartesian and store the result ... As you have known topology (spatial relation between data points) of the input data then this can be done to map directly to mesh/surface instead of to PCL (which would require triangulation or convex hull etc ...).

Your images suggest you have known topology (neighboring pixels are neighboring also in 3D ...) so you can construct mesh 3D surface directly:

align both RGB and Depth 2D maps.

In case this is not already done see:
- Align already captured rgb and depth images
convert to Cartesian coordinate system.

First we compute the position of each pixel in camera local space:

so each pixel (x,y) in RGB map we find out the Depth distance to camera focal point and compute the 3D position relative to the camera focal point.For that we can use triangle similarity so:
```
 x = camera_focus.x + (pixel.x-camera_focus.x)*depth(pixel.x,pixel.y)/focal_length
 y = camera_focus.y + (pixel.y-camera_focus.y)*depth(pixel.x,pixel.y)/focal_length
 z = camera_focus.z +                          depth(pixel.x,pixel.y)
```
where pixel is pixel 2D position, depth(x,y) is coresponding depth, and focal_length=znear is the fixed camera parameter (determining FOV). the camera_focus is the camera focal point position. Its usual that camera focal point is in the middle of the camera image and znear distant to the image (projection plane).

As this is taken from moving device you need to convert this into some global coordinate system (using your camera positon and orientation in space). For that are the best:
- Understanding 4x4 homogenous transform matrices
construct mesh

as your input data are already spatially sorted we can construct QUAD grid directly. Simply for each pixel take its neighbors and form QUADS. So if 2D position in your data (x,y) is converted into 3D (x,y,z) with approach described in previous bullet we can write iot in form of function that returns 3D position:
```
(x,y,z) = 3D(x,y)
```
Then I can form QUADS like this:
```
QUAD( 3D(x,y),3D(x+1,y),3D(x+1,y+1),3D(x,y+1) )
```
we can use for loops:
```
for (x=0;x<xs-1;x++)
 for (y=0;y<ys-1;y++)
  QUAD( 3D(x,y),3D(x+1,y),3D(x+1,y+1),3D(x,y+1) )
```
where xs,ys is the resolution of your maps.

In case you do not know camera properties you can set the focal_length to any reasonable constant (resulting in fish eye effects and or scaled output) or infer it from input data like:

Transformation of 3D objects related to vanishing points and horizon line

Converting a series of depth maps and x, y, z, theta values into a 3D model

1 Answers1