Extend a square in world space to a cube when only screen space coordinates are available

Question

I have a photo of a Go-board, which is basically a grid with n*n squares, each of size a. Depending on how the image was taken, the grid can have either one vanishing point like this (n = 15, board size b = 15*a):

Grid with one vanishing point

or two vanishing points like this (n = 9, board size b = 9*a):

Grid with two vanishing points

So what is available to me are the four screen space coordinates of the four corners of the flat board: p1, p2, p3, p4.

What I would like to do is to calculate the corresponding four screen space coordinates q1, q2, q3, q4 of the corners of the board, if the board was moved 'upward' (perpendicular to the plane of the board) in world space by a, or in other words the coordinates on top of the board, if the board had a thickness of a.

Is the information about the four points even sufficient to calculate this?

If this is not enough information, maybe it would help to make the assumption that the distance of the camera to the center of the board is typically of the order of 1.5 or 2 times the board size b?

From my understanding, the four lines p1-q1, p2-q2, p3-q3, p4-q4 would all go through the same (yet unknown) vanishing point, located somewhere below the board.

Maybe a sufficient approximation (because typically for a Go board n=18 and therefore square size a is small in comparison to the board size) for the direction of each of the lines p1-q1, p2-q2, ... in screen space would be to simply choose a line perpendicular to the horizon (given by the two vanishing points vp1-vp2 or by p1-p2 in the case of only one vanishing point)?

Having made this approximation, still the length of the four lines p1-q1, p2-q2, p3-q3, p4-q4 would need to be calculated ...

Any hints are highly appreciated!

PS: I am using Objective-C & OpenCV

do you know your camera FOV (both x,y) or focal length? if yes you can fit 3D plane to your 4 points (in world coordinates) and from that get 3D coordinates of yor points then you just offset their y coordinates by your height difference and apply camera perspective again ... — Spektre, Apr 13 '18 at 08:02
@Spektre No, I do not have the camera position - only the four courner points (screen coordinates) — Jekapa, Apr 13 '18 at 19:00
FOV is field of view not position it means what visual angle is covered by the camera in `x` and `y` directions. If you do not know it but knows that pixel has the same angular size in x and y than may be this could degenerate to relative ratio units which is still enough. — Spektre, Apr 14 '18 at 07:02
another posibility is to use `VP` or `VP1,VP2` which are easily to compute but for that you need to know the `a` height in pixels for each of the four `p(i)` points. For `VP` case it is easy but for the `VP1,VP2` you have to apply inverse of perspective correct texture mapping computation ... for which you need more than just 4 ponts luckily [bullet #2 in here](https://stackoverflow.com/a/39316776/2521214) might help to obtain the other points. Do you have some sample image for testing for each case? — Spektre, Apr 14 '18 at 07:09
See [Why does it take 5 points to construct a projective frame in ℝ⁴](https://math.stackexchange.com/q/740767/35416) for why 4 points in a plane is not enough to deduce things outside that plane. — MvG, Apr 14 '18 at 10:39
You can count degrees of freedom. A transformation to 2d from 3d homogeneous coordinates is represented by a 3×4 matrix, which may be scaled without affecting the transformations. So you'd have 12 matrix elements, and 11 real degrees of freedom affecting the transformation. Knowing 4 x/y coordinates means you have 8 items of information and thus are short 3 more. Knowing one more distance, as in ratio between board size and camera, would account for one more. Which means without additional assumptions you are still 2 degrees of freedom short of unique. — MvG, Apr 14 '18 at 10:40
@Spektre: With 4 points in the plane of the board, you have fully determined the map form that plane to the image plane. See https://math.stackexchange.com/a/339033/35416 for details. So any information from within that plane can only be used to confirm existing knowledge and reduce errors, but it doesn't tell you anything about what's happening outside that plane. In https://math.stackexchange.com/a/2093168/35416 I wrote a bit about how collinear or coplanar points can yield less information. I realized that if you only need the orthogonal direction, no scale on that, you need one DOF less. — MvG, Apr 19 '18 at 00:01
@Spektre: Continuing the previous: if you knew that the center of the image were actually the center of the picture, i.e. that the line between lens and that point were orthogonal to the sensor, that might be of some use. Knowing that orthogonal point in the picture does indeed give you 2 constraints which might be independent from the others. Haven't done any actual computations; chances are you still end up with more than one possible solution, but perhaps a finite number. — MvG, Apr 19 '18 at 00:03
@MvG I moved my comments and added some stuff into (for now not an) answer. Will update it if I make more progress. — Spektre, Apr 19 '18 at 08:00

Spektre · Answer 1 · 2018-04-19T08:25:18.900

Not yet a full answer but this might help to move forward. As MvG pointed out 4 points alone are not enough. Luckily we know the board is a square so even with perspective distortion the diagonals in 2D should/will intersect at board center (unless serious fish-eye or other distortions are present in the image). Here a test image (created by OpenGL I used as a test input):

The grayish surface is 2D QUAD using 2D perspective distorted corner points (your input). The aqua/bluish grid is 3D OpenGL grid I created the 2D corner points with (to see if they match). The green lines are 2D diagonals and Orange points are the 2D corner points and the diagonals intersection. As you can see 2D diagonal intersection correspond exactly with 3D board mid cell center.

Now we can use the ratio between half diagonal lengths to assume/fit the perspective. If we handle cell coordinates in range <0,9> we want to achieve further division of halve diagonals like this:

I am still not sure how exactly (linear ratio l0/(l0+l1) is not working) so I need to inspect perspective mapping equations to find relative ratio dependence and compute inverse (when I have time mood for this).

If that will be a success than we can compute any points along the diagonals (we want the cell edges). If that is done from that we can easily compute visual size of any cell size a and use the vanishing point without any 3D transform matrices at all.

In case this is not doable there is still the option to use DIP/CV techniques to detect the cell crossings like this:

OpenCV Birdseye view without loss of data

using just the bullet #2 but for that you need to take into account type of images you will have and adjust the detector or add preprocessing for it ...

Now back to your offsetting you can simply offset your cells up by the visual size of the cell like this:

And handle the left side points (either interpolate the size or use the sane as neighboring cell) That should work unless too weird angles of the board are used.

Extend a square in world space to a cube when only screen space coordinates are available

1 Answers1