4

For a project, I have been attempting to transform the depth map given by libfreenect (a 480 by 640 matrix with depth values 0-255) into more usable (x,y,z) coordinates.

I originally assumed that the depth variable d at each pixel represented the Euclidean distance between the sensor and point found. By representing the camera as a point, the matrix as a virtual image plane, and following the vectors from camera to pixels on the plane the distance d, I reconstructed what I thought were the actual coordinates. (Each point is located at distance d along the ray cast through the corresponding pixel). As is evident below in Figure 1, the reconstructed room map (shown from above) is distorted.

Euclidean Distance

Figure 1: d is Euclidean Distance

If I instead assume d represents the forward distance from camera to each point, the result is shown below, in Figure 2. Note the triangular shape, since measured points are located along rays projected from the robot's position. The x and y coordinates are of course scaled based on the depth, and z is the depth value d.

Depth

Figure 2: d is depth from camera, or z coordinate

For reference, here is the map generated if I do not scale the x and y coordinates by the depth, assume d is the z coordinate, and plot(x,y,z). Note the rectangular shape of the room map, since points are not assumed to be located along rays cast from the sensor.

Original

Figure 3: Original Image

Based on the above images, it appears that either Figure 2 or 3 could be correct. Does anyone know what preprocessing libfreenect does on captured data points? I have looked online, but haven't found documentation regarding how depth is preprocessed before being stored in this matrix. Thanks for any help in advance, and I would be glad to supply any additional required information.

TimD1
  • 982
  • 15
  • 26

1 Answers1

3

All of libfreenect's depth formats produce values where each d represents distance from the camera. There are two special formats which include some useful preprocessing.

  • FREENECT_DEPTH_MM produces distance in millimeters.
  • FREENECT_DEPTH_REGISTERED produces distance in millimeters where (x, y) of depth matches (x,y) of video.

The results can be scaled manually to world coordinates, but that may not be totally accurate across different hardware models. A more robust way is to use the helper exposed via libfreenect_registration.h.

FREENECTAPI void freenect_camera_to_world(freenect_device* dev,
               int cx, int cy, int wz, double* wx, double* wy);

Given a depth array, we could convert it to a point cloud.

int i = 0;
std::vector<cv::Point3d> points(mode.width * mode.height);
for (int y = 0; y < mode.height; y++) {
  for (int x = 0; x < mode.width; x++) {
    double wx, wy;
    double z = depth[i];
    freenect_camera_to_world(x, y, z, &wx, &wy);
    points[i] = cv::Point3d(wx, wy, z);
    i++;
  }
}

This should produce a result similar to your Figure 2. To be called from Python, the function will need to be forwarded via the Python wrapper.

piedar
  • 2,599
  • 1
  • 25
  • 37