1

I have a stationary camera and want to get the real world coordinates and rotation from my camera (relative to the pattern). I placed a checkerboard pattern in the FoV of the camera and took some images.

Some might say, this question already has an answer on Stackoverflow, but none of them work. So this is NO duplicate. I have looked into a few posts on StackOverflow, e.g. (Estimate world position of a real camera, Camera position in world coordinate from cv::solvePnP and some more) and the documentation at OpenCV.

As one of the posts above has an accepted answer, I followed that post and implemented the code. I don't have the camera setup anymore, so I can't check if the results are correct, because I don't have ground truth values.

Without ground truth values, my idea to check if the results are correct is to calculate the 3D position of the camera and project that 3D position back into the image. The projected camera position should be precisely in the middle of the image (in a perfect world). However, that projected point is often not even close to the center of the image.

From my understanding the math behind the scenario/idea should work. However, here is my code, and it does not work, even though, I am pretty sure it should work. What am I missing?

import cv2
import numpy as np


class CameraCalibrator:
    def __init__(self, folder, rows, cols, square_size = 0.079, singlePnP = True, mtx = None, dist = None):
        self.folder = folder
        self.mtx = mtx
        self.dist = dist
        self.rows = rows
        self.cols = cols
        self.square_size = square_size
        self.objp = None
        self.axis = None
        self.singlePnP = singlePnP
        self.generate_board()

    def generate_board(self):
        # Generate the 3D points of the intersections of the chessboard pattern
        objp = np.zeros((self.rows * self.cols, 3), np.float32)
        objp[:, :2] = np.mgrid[0:self.rows, 0:self.cols].T.reshape(-1, 2)
        self.objp = objp * self.square_size

        # Generate the axis vectors
        self.axis = np.float32([[self.square_size, 0, 0], [0, self.square_size, 0], [0, 0, -self.square_size]]).reshape(-1, 3)

    def estimate_pose(self, image_names):
        # Loop over all images
        for image_name in image_names:
            # Extract chessboard corners
            img = cv2.imread(self.folder + image_name)
            gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            found_corners, corners = cv2.findChessboardCorners(gray, (self.rows, self.cols), None)
            if found_corners:
                # Refine corners
                criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
                corners = cv2.cornerSubPix(gray, corners, (self.rows, self.cols), (-1, -1), criteria=criteria)

                # Use solve PnP to determine the rotation and translation between camera and 3D object
                ret, rvec, tvec = cv2.solvePnP(self.objp, corners, self.mtx, self.dist)

                # Project the axis into the image
                imgpts, jac = cv2.projectPoints(2 * self.axis, rvec, tvec, self.mtx, self.dist)

                # Draw the axes
                img = self.draw_axes(img, corners, imgpts)

                # Calculate camera position. Following: https://stackoverflow.com/questions/18637494/camera-position-in-world-coordinate-from-cvsolvepnp?rq=1
                rotM = cv2.Rodrigues(rvec)[0]
                cameraPosition = -np.matrix(rotM).T * np.matrix(tvec)

                imgpts, jac = cv2.projectPoints(cameraPosition, rvec, tvec, self.mtx, self.dist)
                # Draw a circle in the center of the image (just as a reference) and draw a line from the top left intersection to the projected camera position
                img = self.draw(img, corners[0], imgpts[0])
                cv2.imshow('img', img)
                k = cv2.waitKey(0) & 0xFF

    def draw_axes(self, img, corners, imgpts):
        # Extract the first corner (the top left)
        corner = tuple(corners[0].ravel())
        corner = (int(corner[0]), int(corner[1]))

        # Color format is BGR
        color = [(0, 0, 255), (0, 255, 0), (255, 0, 0)]

        # Iterate over the points
        for i in range(len(imgpts)):
            tmp = tuple(imgpts[i].ravel())
            tmp = (int(tmp[0]), int(tmp[1]))
            img = cv2.line(img, corner, tmp, color[i], 5)
        return img

    def draw(self, img, corners, imgpts):
        corner = tuple(corners[0].ravel())
        corner = (int(corner[0]), int(corner[1]))
        for i in range(len(imgpts)):
            tmp = tuple(imgpts[i].ravel())
            tmp = (int(tmp[0]), int(tmp[1]))
            img = cv2.line(img, corner, tmp, (255, 255, 0), 5)
        cv2.circle(img, (int(img.shape[1] / 2), int(img.shape[0] / 2)), 1, (255, 255, 255), 10)
        return img

    def calibrate_camera(self, images):
        # Prepare points
        criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
        objp = np.zeros((self.rows * self.cols, 3), np.float32)
        objp[:, :2] = np.mgrid[0:self.rows, 0:self.cols].T.reshape(-1, 2)
        objp = objp * self.square_size
        objpoints = []  # 3d point in real world space
        imgpoints = []  # 2d points in image plane
        
        for img_name in images:
            full_name = self.folder + img_name
            img = cv2.imread(full_name)
            gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            found_corners, corners = cv2.findChessboardCorners(gray, (self.rows, self.cols), None)
            if found_corners:
                objpoints.append(objp)
                corners2 = cv2.cornerSubPix(gray,corners, (11, 11), (-1, -1), criteria)
                imgpoints.append(corners2)
                # Draw and display the corners
                cv2.drawChessboardCorners(img, (8,6), corners2, found_corners)
                cv2.imshow('img', img)
                k = cv2.waitKey(0) & 0xFF
                mtx = np.array([367.47894432, 0.0, 249.3915073,
                                0.0, 367.39795727, 205.2466732,
                                0.0, 0.0, 1.0]).reshape((3, 3))
                dist = np.array([0.10653164, -0.33399435, -0.00111262, -0.00186027, 0.15269198])
        ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], mtx, dist, flags=cv2.CALIB_USE_INTRINSIC_GUESS)
        for img_name in images:
            full_name = self.folder + img_name
            img = cv2.imread(full_name)
            cv2.imshow('img', img)
            k = cv2.waitKey(0) & 0xFF
            # undistort
            dst = cv2.undistort(img, mtx, dist, None, mtx)
            cv2.imshow('img', dst)
            k = cv2.waitKey(0) & 0xFF


def main():
    image_names = ["frame0100.png", "frame1234.png", "frame1345.png", "frame1426.png", "frame1777.png", "frame2860.png", "frame2879.png", "frame3000.png"]
    folder = "/home/X/folder_name/"
    size = 0.079    # Unit is 'meter'

    # Precalibrated camera information
    mtx = np.array([370.68093, 0.0, 250.27853,
                    0.0, 370.65810, 206.94584,
                    0.0, 0.0, 1.0]).reshape((3, 3))
    dist = np.array([0.11336, -0.35520, 0.00076, -0.00117, 0.18745])

    # Number of checkerboard squares per row/col
    rows = 8
    cols = 6

    camera_calibrator = CameraCalibrator(folder, rows=rows, cols=cols, square_size=size, mtx=mtx, dist=dist)
    # The following line is only for camera calibration
    #camera_calibrator.calibrate_camera(image_names)
    camera_calibrator.estimate_pose(image_names)


if __name__ == '__main__':
    main()

The source code should run when copy-pasted. Only 'folder' (and 'image_names') have to be adjusted. The images that I use for testing purposes can be found here: checkerboards.zip

I've run out of ideas, thanks in advance.

Sheradil
  • 407
  • 3
  • 14
  • 1
    Click [Here](https://stackoverflow.com/questions/18637494/camera-position-in-world-coordinate-from-cvsolvepnp?rq=1) to get to the question, the c++ solution (accepted answer) seems to work. –  Mar 29 '23 at 02:03
  • the zip file download does not work – Ralf Ulrich Mar 29 '23 at 12:37
  • @RalfUlrich I updated the file link. Should stay active for 30 days – Sheradil Mar 31 '23 at 11:23
  • you project the camera's position into _its own_ image? that makes no sense. it would not have "a point" in its own image at all. – Christoph Rackwitz Apr 02 '23 at 10:40
  • @ChristophRackwitz Of course it makes sense. With the solvePnP method you get the transformation matrix between 2D and 3D. So you can project 3D points into the image. And while the 3D position of the camera itself doesn't really have a point in the image, the *projected* 3D position surely has. And that one should be precisely in the center of the image (without rounded numbers etc). – Sheradil Apr 03 '23 at 08:13
  • @StephenGzy Well, that's exactly what I am doing here. Just in python and not c++. But it does not seems to work. Only difference is, that I do not use a 4x4 matrix, but he project points method that gets the rvec and tvec (that's what the 4x4 matrix is made of). – Sheradil Apr 03 '23 at 08:16
  • any point that *isn't* equal to the camera origin *can* be projected onto the image plane of the camera. the camera origin itself is *not* projectable. it has z=0. you are asking for the mathematically impossible with great confidence that it is possible. – Christoph Rackwitz Apr 03 '23 at 09:15
  • Yes, and no. I had a long talk with a former coworker before I made this post, because I had the same thought (and added a tiny offset to the camera position). But, that does not apply here. At first, for your statement to be true, the camera position that I calculate would have to be 100% precise. Which it will never be due to the nature of our number representations in computers. And besides that, you can project the camera position to the image frame. The camera position is always a tiny bit behind the actual world coordinates of the image frame. Thus, projection would still work. – Sheradil Apr 03 '23 at 09:22
  • since your former coworker isn't here, it is impossible to argue against his misconceptions, or your belief in his authority. -- the camera position is the camera origin. the camera isn't "behind" anything. it is the origin. the *projection plane* is offset to the origin, because it must be. the exact position of that is merely an implementation detail. -- you needn't explain the nature of floating point numbers or measurements. you basically said you want to rely on noise/error for sensible results. this is a terribly bad idea, numerically speaking. divide by a "tiny bit" = huge error – Christoph Rackwitz Apr 03 '23 at 10:13
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/252937/discussion-between-sheradil-and-christoph-rackwitz). – Sheradil Apr 03 '23 at 10:31

0 Answers0