9

I have a height map for an image, which tells me the offset of each pixel in the Z direction. My goal is to flatten a distorted image using only it's height map.

How would I go about doing this? I know the position of the camera, if that helps.


To do this, I was thinking about assuming that each pixel was a point on a plane, and then to translate each of those points vertically according to the Z-value I get from the height map, and from that translation (imagine you are looking at the points from above; the shift will cause the point to move around from your perspective).

From that projected shift, I could extract X and Y-shift of each pixel, which I could feed into cv.Remap().

But I have no idea how I could get the projected 3D offset of a point with OpenCV, let alone construct a offset map out of it.


Here are my reference images for what I'm doing:

Calibration Image Warped Image

I know the angle of the lasers (45 degrees), and from the calibration images, I can calculate the height of the book really easily:

h(x) = sin(theta) * abs(calibration(x) - actual(x))

I do this for both lines and linearly interpolate the two lines to generate a surface using this approach (Python code. It's inside a loop):

height_grid[x][y] = heights_top[x] * (cv.GetSize(image)[1] - y) + heights_bottom[x] * y

I hope this helps ;)


Right now, this is what I have to dewarp the image. All that strange stuff in the middle projects a 3D coordinate onto the camera plane, given it's position (and the camera's location, rotation, etc.):

class Point:
  def __init__(self, x = 0, y = 0, z = 0):
    self.x = x
    self.y = y
    self.z = z

mapX = cv.CreateMat(cv.GetSize(image)[1], cv.GetSize(image)[0], cv.CV_32FC1)
mapY = cv.CreateMat(cv.GetSize(image)[1], cv.GetSize(image)[0], cv.CV_32FC1)

c = Point(CAMERA_POSITION[0], CAMERA_POSITION[1], CAMERA_POSITION[2])
theta = Point(CAMERA_ROTATION[0], CAMERA_ROTATION[1], CAMERA_ROTATION[2])
d = Point()
e = Point(0, 0, CAMERA_POSITION[2] + SENSOR_OFFSET)

costx = cos(theta.x)
costy = cos(theta.y)
costz = cos(theta.z)

sintx = sin(theta.x)
sinty = sin(theta.y)
sintz = sin(theta.z)


for x in xrange(cv.GetSize(image)[0]):
  for y in xrange(cv.GetSize(image)[1]):
    
    a = Point(x, y, heights_top[x / 2] * (cv.GetSize(image)[1] - y) + heights_bottom[x / 2] * y)
    b = Point()
    
    d.x = costy * (sintz * (a.y - c.y) + costz * (a.x - c.x)) - sinty * (a.z - c.z)
    d.y = sintx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) + costx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    d.z = costx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) - sintx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    
    mapX[y, x] = x + (d.x - e.x) * (e.z / d.z)
    mapY[y, x] = y + (d.y - e.y) * (e.z / d.z)
    

print
print 'Remapping original image using map...'

remapped = cv.CreateImage(cv.GetSize(image), 8, 3)
cv.Remap(image, remapped, mapX, mapY, cv.CV_INTER_LINEAR)

This is turning into a huge thread of images and code now... Anyways, this code chunk takes my 7 minutes to run on a 18MP camera image; that's way too long, and in the end, this approach does nothing to the image (the offset for each pixel is << 1).

Any ideas?

Community
  • 1
  • 1
Blender
  • 289,723
  • 53
  • 439
  • 496

3 Answers3

3

I ended up implementing my own solution:

for x in xrange(cv.GetSize(image)[0]):
  for y in xrange(cv.GetSize(image)[1]):

    a = Point(x, y, heights_top[x / 2] * (cv.GetSize(image)[1] - y) + heights_bottom[x / 2] * y)
    b = Point()

    d.x = costy * (sintz * (a.y - c.y) + costz * (a.x - c.x)) - sinty * (a.z - c.z)
    d.y = sintx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) + costx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    d.z = costx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) - sintx * (costz * (a.y - c.y) - sintz * (a.x - c.x))

    mapX[y, x] = x + 100.0 * (d.x - e.x) * (e.z / d.z)
    mapY[y, x] = y + 100.0 * (d.y - e.y) * (e.z / d.z)


print
print 'Remapping original image using map...'

remapped = cv.CreateImage(cv.GetSize(image), 8, 3)
cv.Remap(image, remapped, mapX, mapY, cv.CV_INTER_LINEAR)

This (slowly) remaps each pixel using the cv.Remap function, and this seems to kind of work...

Blender
  • 289,723
  • 53
  • 439
  • 496
0

Distortion based on distance from the camera only happens with a perspective projection. If you have the (x,y,z) position of a pixel, you can use the projection matrix of the camera to unproject the pixels back into world-space. With that information, you can render the pixels in an orthographic way. However, you may have missing data, due to the original perspective projection.

tkerwin
  • 9,559
  • 1
  • 31
  • 47
  • Can OpenCV map 3D into 2D? Or do I have to come up with my own formula for this? I'll try implementing this, though, so thanks! – Blender Mar 02 '11 at 20:59
0

Separate your scene out as follows:

  • you have an unknown bitmap image I(x,y) -> (r,g,b)
  • you have a known height field H(x,y) -> h
  • you have a camera transform C(x,y,z) -> (u,v) which projects the scene to a screen plane

Note that the camera transform throws information away (you do not get a depth value for each screen pixel). You may also have bits of scene overlap on screen, in which case only the foremost gets shown - the rest is discarded. So in general this is not perfectly reversible.

  • you have a screenshot S(u,v) which is a result of C(x,y,H(x,y)) for x,y in I
  • you want to generate a screenshot S'(u',v') which is a result of C(x,y,0) for x,y in I

There are two obvious ways to approach this; both depend on having accurate values for the camera transform.

  1. Ray-casting: for each pixel in S, cast a ray back into the scene. Find out where it hits the heightfield; this gives you (x,y) in the original image I, and the screen pixel gives you the color at that point. Once you have as much of I as you can recover, re-transform it to find S'.

  2. Double-rendering: for each x,y in I, project to find (u,v) and (u',v'). Take the pixel-color from S(u,v) and copy it to S'(u',v').

Both methods will have sampling problems which be helped by super-sampling or interpolation; method 1 will leave empty spaces in occluded areas of the image, method 2 will 'project through' from the first surface.

Edit:

I had presumed you meant a CG-style heightfield, where each pixel in S is directly above the corresponding location in S'; but this is not how a page drapes over a surface. A page is fixed at the spine and is non-stretchy - lifting the center of a page pulls the free edge toward the spine.

Based on your sample image, you'll have to reverse this cumulative pulling - detect the spine centerline location and orientation and work progressively left and right, finding the change in height across the top and bottom of each vertical strip of page, calculating the resulting aspect-narrowing and skew, and reversing it to re-create the original flat page.

Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
  • I edited my answer accordingly. I'l also include reference images too, just so that you can see what I mean. – Blender Mar 02 '11 at 21:53
  • Yes, the sample images are a great help. A couple of thoughts: first, you can make the image nearly-orthogonal to start with by using a telephoto lens and shooting from as far back as possible. Second, the pages are resting with some vertical skew to them - resting the bottom edge against a planar surface could reduce or eliminate this. Then the image correction ends up being just width correction by arccosine of page incidence angle (ie really simple). – Hugh Bothwell Mar 02 '11 at 22:25
  • I'm stuck with only a 3X zoom lens, so I'll have to live with manually correcting for tangential and radial warping. Could you elaborate a bit more on the `arccos()` method? I'm not quite getting it. – Blender Mar 02 '11 at 23:09
  • Consider a thin vertical slice of page. The apparent width of this slice (as viewed by the camera) varies with the cosine of the angle at which the page is tilted (if the page is flat, it appears 100% width, at 45 degrees it appears 70.7% width, etc). So if you know the angle of tilt, multiply the apparent width by 1/cos(angle) to get the actual width. – Hugh Bothwell Mar 03 '11 at 02:07
  • Oh, I was thinking about the page dewarping ;) The camera tilt is covered by OpenCV's camera calibration, so I'm not worrying too much about it. I tried implementing what I was describing, and it was quite abortive. My input/output images differed by literally a pixel... – Blender Mar 03 '11 at 03:36