3D Rotate an image with depth map

Question

I have an RGB image of shape (h, w, 3) and a corresponding depth map of shape (h, w). Thus I know, for each pixel, its 3D coordinates.
I would like to rotate the image by some 3D rotation matrix.
I know how to apply the rotation to the input coordinates and get the coordinates in the target view, but how do I render the new view given the input image pixel values?

I tried using scipy's griddata, but this interpolation "fills" in gaps for occluded regions and overall performs interpolation, but not rendering of the new view.

Is there a better way to render the new rotated view in pytorch or numpy?

I don't think there is an easy general solution, especially with occlusion or gaps. May be this helps: https://towardsdatascience.com/5-step-guide-to-generate-3d-meshes-from-point-clouds-with-python-36bad397d8ba — Alexey Birukov, Mar 23 '22 at 09:48
I am actually working on the same problem, but with the added complication that we go from image to world. Basically, like you say, once we have the 3D points, we can associate them with the RGB values of the original image. I have been trying this with open3d - you can create a point cloud from depth and rgb. You can now treat this point cloud as an 'image' with associated pixels (that's a programming exercise). After that we need to collect points that fall on the same z value and do some kind of colorization (say, max pooling or averaging). See this https://arxiv.org/abs/1812.05784 — kakrafoon, Jan 07 '23 at 17:19

kakrafoon · Answer 1 · 2023-01-20T16:19:47.803

Here's some code that would do the association.

 def get_colored_point_cloud(calib, rgb, depth):
    """
    pass in rgb and associated depth map
    return point cloud and color for each point
    cloud.shape -> (num_points, 3) for [x, y, z]
    colors.shape -> (num_points, 3) for [r, g, b] 
    """
    rows, cols = depth.shape 
    #create a grid and stack depth and rgb
    c, r = np.meshgrid(np.arange(cols), np.arange(rows)) # c-> (cols, rows), r-> (cols, rows)
    points = np.stack([c, r, depth]) # -> stacking (3, num_points)
    colors = np.stack([c, r, rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]]) 
    points = points.reshape(3, -1) #-> (3, num_points)
    colors = colors.reshape(5, -1) #-> (5, num_points)
    points = points.T #-> (num_points, 3)
    colors = colors.T #-> (num_points, 5)
    #now transform [u, v] to [x, y, z] by camera unprojection
    cloud = unproject_image_to_point_cloud(points, calib.intrinsic_params) #-> (num_points, 3)
    return cloud, colors[:,2:5] # (num_points, 3), (num_points, 3)

It is also possible to do this through open3d. But you will have to deal with practical matters of getting the view as desired for it to work in open3d.

See this post: Generate point cloud from depth image

The more direct way of doing this instead of the somewhat ugly meshgrid process (at least the way I have written it) is by creating separate arrays for point(col_index, row_index, z) and color(col_index, row_index, R, G, B), and transforming (col_index, row_index,z) to (x, y, z) in an unrolled way for each point, but this is much slower as it does not use numpy vectorization magic under the hood.

def get_colored_point_cloud(calib, rgb, depth):
    points = []
    colors = []

    rows, cols = depth.shape

    for i in range(rows):
        for j in range(cols):
            z = depth[i, j]
            r = rgb[i,j,0]
            g = rgb[i,j,1]
            b = rgb[i,j,2]
            points.append([j,i,z])
            colors.append([r,g,b])

    points = np.asarray(points)
    colors = np.asarray(colors)
    cloud = unproject_image_to_point_cloud(points,\
    calib.intrinsic_params) #-> (num_points, 3)
    return cloud

As @AlexeyBirukov says, this is not going to work well with occlusions. It is tantamount to applying homography in a sense. — kakrafoon, Jan 08 '23 at 22:47
Thanks for the answer. What does the function `unproject_to_point_cloud` do? why do you append `r, c` to `colors` just to remove it at the end? — Shai, Jan 09 '23 at 06:43
The `unproject_to_point_cloud` function uses camera intrinsics to transform `(u, v)` image coordinates to `(x, y, z)` world coordinates. It involves a multiplication by depth and division by focal length (the inverse of world to camera transformation). Elaborated here https://www.cse.psu.edu/~rtc12/CSE486/lecture12.pdf. — kakrafoon, Jan 20 '23 at 15:41
Appending 'colors' will create an array of shape `(5, num_points)` (as it stacks cols, rows and colors, the last of which is of shape (3, num_points). So we only pick (R, G, B) from it. Not the neatest code, but its much faster than unrolling by hand. I adapted these from the kitti object vis helpers https://github.com/kuixu/kitti_object_vis/blob/master/kitti_util.py — kakrafoon, Jan 20 '23 at 16:00

3D Rotate an image with depth map

1 Answers1