OpenCV get 3D coordinates from 2D pixel

Question

For my undergraduate paper I am working on a iPhone Application using openCV to detect domino tiles. The detection works well in close areas, but when the camera is angled the tiles far away are difficult to detect. My approach to solve this I would want to do some spacial calculations. For this I would need to convert a 2D Pixel value into world coordinates, calculate a new 3D position with a vector and convert these coordinates back to 2D and then check the colour/shape at that position.

Additionally I would need to know the 3D positions for Augmented Reality additions.

The Camera Matrix i got trough this link create opencv camera matrix for iPhone 5 solvepnp

The Rotationmatrix of the Camera I get from the Core Motion.

Using Aruco markers would be my last resort, as I woulnd't get the decided effect that I would need for the paper.

Now my question is, can i not make calculations when I know the locations and distances of the circles on a lets say Tile with a 5 on it? I wouldn't need to have a measurement in mm/inches, I can live with vectors without measurements.

The camera needs to be able to be rotated freely.

I tried to invert the calculation sm'=A[R|t]M' to be able to calculate the 2D coordinates in 3D. But I am stuck with inverting the [R|t] even on paper, and I don't know either how I'd do that in swift or c++.

I have read so many different posts on forums, in books etc. and I am completely stuck and appreciate any help/input you can give me. Otherwise I'm screwed.

Thank you so much for your help.

Update:

By using the solvePnP that was suggested by Micka I was able to get the Rotation and Translation Vectors for the angle of the camera. Meaning that if you are able to identify multiple 2D Points in your image and know their respective 3D World coordinates (in mm, cm, inch, ...), then you can get the mechanisms to project points from known 3D World coordinates onto the respective 2D coordinates in your image. (use the opencv projectPoints function).

What is up next for me to solve is the translation from 2D into 3D coordinates, where I need to follow ozlsn's approach with the inverse of the received matrices out of solvePnP.

Update 2: With a top down view I am getting along quite well to being able to detect the tiles and their position in the 3D world: tile from top Down

However if I am now angling the view, my calculations are not working anymore. For example I check the bottom Edge of a 9-dot group and the center of the black division bar for 90° angles. If Corner1 -> Middle Edge -> Bar Center and Corner2 -> Middle Edge -> Bar Center are both 90° angles, than the bar in the middle is found and the position of the tile can be found.

When the view is Angled, then these angles will be shifted due to the perspective to lets say 130° and 50°. (I'll provide an image later).

The Idea I had now is to make a solvePNP of 4 Points (Bottom Edge plus Middle), claculate solvePNP and then rotate the needed dots and the center bar from 2d position to 3d position (height should be irrelevant?). Then i could check with the translated points if the angles are 90° and do also other needed distance calculations.

Here is an image of what I am trying to accomplish: Markings for Problem

I first find the 9 dots and arrange them. For each Edge I try to find the black bar. As said above, seen from Top, the angle blue corner, green middle edge to yellow bar center is 90°. However, as the camera is angled, the angle is not 90° anymore. I also cannot check if both angles are 180° together, that would give me false positives. So I wanted to do the following steps:

Detect Center
Detect Edges (3 dots)
SolvePnP with those 4 points
rotate the edge and the center points (coordinates) to 3D positions
Measure the angles (check if both 90°)

Now I wonder how I can transform the 2D Coordinates of those points to 3D. I don't care about the distance, as I am just calculating those with reference to others (like 1.4 times distance Middle-Edge) etc., if I could measure the distance in mm, that would even be better though. Would give me better results.

With solvePnP I get the rvec which I could change into the rotation Matrix (with Rodrigues() I believe). To measure the angles, my understanding is that I don't need to apply the translation (tvec) from solvePnP.

This leads to my last question, when using the iPhone, can't I use the angles from the motion detection to build the rotation matrix beforehand and only use this to rotate the tile to show it from the top? I feel that this would save me a lot of CPU Time, when I don't have to solvePnP for each tile (there can be up to about 100 tile).

Find Homography

vector<Point2f> tileDots;
tileDots.push_back(corner1);
tileDots.push_back(edgeMiddle);
tileDots.push_back(corner2);
tileDots.push_back(middle.Dot->ellipse.center);

vector<Point2f> realLivePos;
realLivePos.push_back(Point2f(5.5,19.44));
realLivePos.push_back(Point2f(12.53,19.44));
realLivePos.push_back(Point2f(19.56,19.44));
realLivePos.push_back(Point2f(12.53,12.19));

Mat M = findHomography(tileDots, realLivePos, CV_RANSAC);

cout << "M = "<< endl << " "  << M << endl << endl;

vector<Point2f> barPerspective;
barPerspective.push_back(corner1);
barPerspective.push_back(edgeMiddle);
barPerspective.push_back(corner2);
barPerspective.push_back(middle.Dot->ellipse.center);
barPerspective.push_back(possibleBar.center);
vector<Point2f> barTransformed;

if (countNonZero(M) < 1)
{
    cout << "No Homography found" << endl;
} else {
    perspectiveTransform(barPerspective, barTransformed, M);
}

This however gives me wrong values, and I don't know anymore where to look (Sehe den Wald vor lauter Bäumen nicht mehr).

Image Coordinates https://i.stack.imgur.com/c67EH.png
World Coordinates https://i.stack.imgur.com/Im6M8.png
Points to Transform https://i.stack.imgur.com/hHjBM.png
Transformed Points https://i.stack.imgur.com/P6lLS.png

You see I am even too stupid to post 4 images here??!!?

The 4th index item should be at x 2007 y 717. I don't know what I am doing wrongly here.

Update 3: I found the following post Computing x,y coordinate (3D) from image point which is doing exactly what I need. I don't know maybe there is a faster way to do it, but I am not able to find it otherwise. At the moment I can do the checks, but still need to do tests if the algorithm is now robust enough.

Result with SolvePnP to find bar Center

if you know where tile 5 is in your image and all the other space you want to observe is on the same plane as tile 5, you have everything you need. Use opencv's solvePnP function to determine camera position/rotation and use plane coordinates to measure other object's positions on that plane. Even better if you know the exact size of tile 5 — Micka, Dec 30 '17 at 23:49
Yes the other tiles are in the same plane, however Would I not need to know the distance from the "5" tile from the camera to being able to give the correct x,y,z coordinates? Or do I just give one of the points 0,0,0? — Maverick2805, Dec 31 '17 at 00:33
that tile (or any other reference object you choose) would need to be static or the camera has to be static if you want to fix the coordinate system. — Micka, Dec 31 '17 at 09:43
the reference tile would be static for each analysis, but can be changing for different analysis. I have made some progress with the solvePnP, thank you very much for that tip (will update the question above soon). I think I made quite some progress, but need to find a way to do conversions from 2D to 3D for some further calculations. — Maverick2805, Dec 31 '17 at 16:38
a short hint for 2D to 3D-plane: You can compute a homography from your pixel tile corners to the 3D plane coordinate system (should be the same 2D/3D coordinates you used in solvePnP) and with that homography you can compute pixel-to-plane for each pixel. — Micka, Dec 31 '17 at 20:30
Around that I coulnd't get my head around yet, as the tiles can lie around wildly on the plane and have different values. Moreover I cannot guarantee to being able to detect the rectangle representing a full stone. If they are glued together, then in far away tiles the edges are not clearly detected. — Maverick2805, Jan 01 '18 at 15:12
ok, this only works if you have at least 4 known points on the plane. — Micka, Jan 01 '18 at 18:32
can you add some pictures or drawings of your task? Maybe I have a wrong imagination of your setting. — Micka, Jan 01 '18 at 19:07
Sorry for the long time for an answer, I had huge problem solving several mathematical problems to even being able to analyze the domino tiles from top down. This is working out quite well I have to say, but now when I am going into angled views, it's all screwed up :) I'll add some images in the top post. — Maverick2805, Jan 24 '18 at 17:42
I thought about the approach getPerspectiveTransform() and warpPerspective() but then I would need to find the contours twice. Is there no possibility to just wrapPerspective() 4 Points? I am starting to loose my mind herr. — Maverick2805, Jan 24 '18 at 23:28
cv::perspectiveTransform or cv::transformPerspective or similar — Micka, Jan 25 '18 at 00:10
I think I need to give up, I am too stupid for this. when applying perspective Transform to the points, the value I am getting as a result is incorrect. — Maverick2805, Jan 25 '18 at 21:50
in your "markings for problem" image, you can see, that the red line to the white circles doesn't go well through their centers. How did you compute the direction? It should be the line through green-red-green cross, but I think your circle centers aren't well detected/marked and I'm not sure about camera lens distortion in that image. — Micka, Jan 26 '18 at 06:53
maybe you can just measure and use the corners of the black bar in the middle of the stone? — Micka, Jan 26 '18 at 06:54
rough check: The crosses look quite ok, how did you compute the red line? — Micka, Jan 26 '18 at 06:56
I got some "ok" results without knowing the 3D measurements: https://img1.picload.org/image/ddiwopwr/domino.jpg The green circles are used to compute the homography, the yellow circles are computed from the homography and 3D coordinates. — Micka, Jan 26 '18 at 09:18
ok, the red and pink line are the 3d coordinate system of your solvePnp? I guess for that, your computed center crosses aren't wenn enough in the 3D center of the holes. You'll have to regard the perspective. Maybe you can use some kind of bounding quad around all the circles as reference? That shouldn't give you those problems. — Micka, Jan 26 '18 at 11:23
The lines and crosses I have added in Photoshop. Maybe they are a bit off. It was more to show a representation. The solve PnP is working fine for most testimages i have. I can add the results to the question once I am at home. For some gray points I have an issue though becaus it doesn’t detect some circles (the light gray is too close to the white of the tiles) I use HSV converted in white/grayish range. And Canny to find the circles. Maybe that isn‘t the best approach but the obly one working for me. — Maverick2805, Jan 26 '18 at 16:37

score 0 · Answer 1 · answered Dec 31 '17 at 06:08

The matrix [R|t] is not square, so by-definition, you cannot invert it. However, this matrix lives in the projective space, which is nothing but an extension of R^n (Euclidean space) with a '1' added as the (n+1)st element. For compatibility issues, the matrices that multiplies with vectors of the projective space are appended by a '1' at their lower-right corner. That is : R becomes

[R|0]
[0|1]

In your case [R|t] becomes

[R|t]
[0|1]

and you can take its inverse which reads as

[R'|-Rt]
[0 | 1 ]

where ' is a transpose. The portion that you need is the top row.

Since the phone translates in the 3D space, you need the distance of the pixel in consideration. This means that the answer to your question about whether you need distances in mm/inches is a yes. The answer changes only if you can assume that the ratio of camera translation to the depth is very small and this is called weak perspective camera. The question that you're trying to tackle is not an easy one. There is still people researching on this at PhD degree.

I tried to solve this, but I am getting crazy numbers which don't seem to add up. rvec = [-0.6460095212173805; 2.037458031110235; -1.796950744317753] which results in Rotation Matrix: Rodrigues rotation Matrix = [-0.8358446378210687, -0.1072498757737782, 0.5383875978997618; -0.5475693554041162, 0.0929663142410786, -0.8315798611310578; 0.03913492619243286, -0.9898761177859267, -0.1364321406010136]. If I transpose the R and multiply -R with t and then multiply by the original coordinates (2009|871|1|1) I am getting -14'201|18657 and that seems fairly wrong — Maverick2805, Jan 24 '18 at 22:58

OpenCV get 3D coordinates from 2D pixel

1 Answers1