Camera pixels to planar world points given 4 known points

Question

My problem I am assuming is easy, but I still haven't been able to solve it due to my experience in linear algebra a while ago. I have read presentations published from several universities but I just can't seem to follow the somewhat non-standardized notation. If anyone has a better example it would be much appreciated...

Problem: The camera is angled down facing the floor. Given a pixel coordinate, I want to be able to get the respective 3D world coordinate on the plane of the floor.

Known:

4 dots on the floor where I know the Pixel(x,y) coordinates and the associated World(X,Y,Z=0) coordinates.
The camera's position is fixed and I know the displacement in the X,Y,Z directions of the camera.

Unknown:

The camera's rotations about the x,y,z axis. Primarily, the camera is rotated just about the X axis with Y & Z being minimal rotations but I think should be taken into account.
Distortion coefficients, however there is minimal bending of images in the lines and would much prefer to not bring in the checkerboard calibration procedure. Some error as a result of this is not a deal breaker.

What I've looked into A phenomenal example is found here. In essence it's the exact same problem, but some follow up questions:

SolvePnP looks to be my friend, but I'm not too sure what to do about the Camera Matrix or the distCoefficients. Is there some way I can avoid the camera matrix and dist coefficients calibration steps, which is done I think with the checkerboard process (maybe at the cost of some accuracy)? Or is there some simpler way to do this?

Much appreciate your input!

compute the homography from image to floor coordinates. Isnt that all you need if the camera is static? — Micka, Sep 11 '14 at 13:29

Micka · Accepted Answer · 2014-09-11T14:43:56.057

try this approach:

compute the homography from 4 point correspondences, giving you all information to transform between image plane and ground plane coordinates.

limitation of this approach is that it assumes a uniformly parameterized image plane (pinhole camera), so lens distortion will give you errors as seen in my example. if you are able to remove lens distortion effects, you'll go very well with this approach i guess. In addition you will get some error of giving slightly wrong pixel coordinates as your correspondences, you can get more stable values if you provide more correspondences.

Using this input image

enter image description here

I've read the 4 corners of one chess field from an image manipulation software, which will correspond to the fact that you know 4 points in your image. I've chosen those points (marked green):

enter image description here

now I've done two things: first transforming chessboard pattern coordinates to image (0,0) , (0,1) etc this gives a good visual impression of mapping quality. second I transform from image to world. reading the leftmost corner position in image location (87,291) which corresponds to (0,0) in chessboard coordinates. if i transform that pixel location you would expect (0,0) as a result.

cv::Point2f transformPoint(cv::Point2f current, cv::Mat transformation)
{
    cv::Point2f transformedPoint;
    transformedPoint.x = current.x * transformation.at<double>(0,0) + current.y * transformation.at<double>(0,1) + transformation.at<double>(0,2);
    transformedPoint.y = current.x * transformation.at<double>(1,0) + current.y * transformation.at<double>(1,1) + transformation.at<double>(1,2);
    float z = current.x * transformation.at<double>(2,0) + current.y * transformation.at<double>(2,1) + transformation.at<double>(2,2);
    transformedPoint.x /= z;
    transformedPoint.y /= z;

    return transformedPoint;
}

int main()
{
    // image from http://d20uzhn5szfhj2.cloudfront.net/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/5/2/52440-chess-board.jpg

    cv::Mat chessboard = cv::imread("../inputData/52440-chess-board.jpg");

    // known input:
    // image locations / read pixel values
    //  478,358
    //  570, 325
    //  615,382
    //  522,417

    std::vector<cv::Point2f> imageLocs;
    imageLocs.push_back(cv::Point2f(478,358));
    imageLocs.push_back(cv::Point2f(570, 325));
    imageLocs.push_back(cv::Point2f(615,382));
    imageLocs.push_back(cv::Point2f(522,417));

    for(unsigned int i=0; i<imageLocs.size(); ++i)
    {
        cv::circle(chessboard, imageLocs[i], 5, cv::Scalar(0,0,255));
    }
    cv::imwrite("../outputData/chessboard_4points.png", chessboard);

    // known input: this is one field of the chessboard. you could enter any (corresponding) real world coordinates of the ground plane here.
    // world location:
    // 3,3
    // 3,4
    // 4,4
    // 4,3

    std::vector<cv::Point2f> worldLocs;
    worldLocs.push_back(cv::Point2f(3,3));
    worldLocs.push_back(cv::Point2f(3,4));
    worldLocs.push_back(cv::Point2f(4,4));
    worldLocs.push_back(cv::Point2f(4,3));


    // for exactly 4 correspondences. for more you can use cv::findHomography
    // this is the transformation from image coordinates to world coordinates:
    cv::Mat image2World = cv::getPerspectiveTransform(imageLocs, worldLocs);
    // the inverse is the transformation from world to image.
    cv::Mat world2Image = image2World.inv();


    // create all known locations of the chessboard (0,0) (0,1) etc we will transform them and test how good the transformation is.
    std::vector<cv::Point2f> worldLocations;
    for(unsigned int i=0; i<9; ++i)
        for(unsigned int j=0; j<9; ++j)
        {
            worldLocations.push_back(cv::Point2f(i,j));
        }


    std::vector<cv::Point2f> imageLocations;

    for(unsigned int i=0; i<worldLocations.size(); ++i)
    {
        // transform the point
        cv::Point2f tpoint = transformPoint(worldLocations[i], world2Image);
        // draw the transformed point
        cv::circle(chessboard, tpoint, 5, cv::Scalar(255,255,0));
    }

    // now test the other way: image => world
    cv::Point2f imageOrigin = cv::Point2f(87,291);
    // draw it to show which origin i mean
    cv::circle(chessboard, imageOrigin, 10, cv::Scalar(255,255,255));
    // transform point and print result. expected result is "(0,0)"
    std::cout << transformPoint(imageOrigin, image2World) << std::endl;

    cv::imshow("chessboard", chessboard);
    cv::imwrite("../outputData/chessboard.png", chessboard);
    cv::waitKey(-1);


}

the resulting image is:

enter image description here

as you can see there is some big amount of error in the data. as I said it's because of slightly wrong pixel coordinates given as correspondences (and within a small area!) and because of lens distortion preventing the ground plane to appear as a real plane on the image.

results of transforming (87,291) to world coordinates are:

[0.174595, 0.144853]

expected/perfect result would've been [0,0]

hope this helps.

Hey, this is a pretty cool idea. So essentially I am computing the perspective transformation matrix and then that's how you're cycling back and forth. So I looked more into getting the lens distortion etc, and it doesn't seem *too* bad. Where would that factor in? On first reading in the image, immediately undistort it? — user2891729, Sep 11 '14 at 23:08
Yes, undistort original image (adjust image location of your known points too) and you're done — Micka, Sep 12 '14 at 05:24

score 0 · Answer 2 · answered Sep 10 '14 at 17:44

0

Sure, you can set the cameraMatrix to the Identity matrix (eye(3)) and set distCoefficients to NULL and solvePNP will assume you have a perfect camera. This will, as you say, introduce some additional inaccuracy, but you will still get an answer.

If you find that your results are not accurate enough, camera calibration really isn't that big of a deal.

answered Sep 10 '14 at 17:44

beaker

16,331
3
32
49

Hey, I appreciate the response. Duh, that should have occurred to me to assume perfect conditions. Let me give it a try before I mark the answer. – user2891729 Sep 10 '14 at 18:14

Camera pixels to planar world points given 4 known points

2 Answers2

Linked