10

I have a rectangular target of known dimensions and location on a wall, and a mobile camera on a robot. As the robot is driving around the room, I need to locate the target and compute the location of the camera and its pose. As a further twist, the camera's elevation and azimuth can be changed using servos. I am able to locate the target using OpenCV, but I am still fuzzy on calculating the camera's position (actually, I've gotten a flat spot on my forehead from banging my head against a wall for the last week). Here is what I am doing:

  1. Read in previously computed camera intrinsics file
  2. Get the pixel coordinates of the 4 points of the target rectangle from the contour
  3. Call solvePnP with the world coordinates of the rectangle, the pixel coordinates, the camera matrix and the distortion matrix
  4. Call projectPoints with the rotation and translation vectors
  5. ???

I have read the OpenCV book, but I guess I'm just missing something on how to use the projected points, rotation and translation vectors to compute the world coordinates of the camera and its pose (I'm not a math wiz) :-(

2013-04-02 Following the advice from "morynicz", I have written this simple standalone program.

#include <Windows.h>
#include "opencv\cv.h"

using namespace cv;

int main (int argc, char** argv)
{
const char          *calibration_filename = argc >= 2 ? argv [1] : "M1011_camera.xml";
FileStorage         camera_data (calibration_filename, FileStorage::READ);
Mat                 camera_intrinsics, distortion;
vector<Point3d>     world_coords;
vector<Point2d>     pixel_coords;
Mat                 rotation_vector, translation_vector, rotation_matrix, inverted_rotation_matrix, cw_translate;
Mat                 cw_transform = cv::Mat::eye (4, 4, CV_64FC1);


// Read camera data
camera_data ["camera_matrix"] >> camera_intrinsics;
camera_data ["distortion_coefficients"] >> distortion;
camera_data.release ();

// Target rectangle coordinates in feet
world_coords.push_back (Point3d (10.91666666666667, 10.01041666666667, 0));
world_coords.push_back (Point3d (10.91666666666667, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 10.01041666666667, 0));

// Coordinates of rectangle in camera
pixel_coords.push_back (Point2d (284, 204));
pixel_coords.push_back (Point2d (286, 249));
pixel_coords.push_back (Point2d (421, 259));
pixel_coords.push_back (Point2d (416, 216));

// Get vectors for world->camera transform
solvePnP (world_coords, pixel_coords, camera_intrinsics, distortion, rotation_vector, translation_vector, false, 0);
dump_matrix (rotation_vector, String ("Rotation vector"));
dump_matrix (translation_vector, String ("Translation vector"));

// We need inverse of the world->camera transform (camera->world) to calculate
// the camera's location
Rodrigues (rotation_vector, rotation_matrix);
Rodrigues (rotation_matrix.t (), camera_rotation_vector);
Mat t = translation_vector.t ();
camera_translation_vector = -camera_rotation_vector * t;

printf ("Camera position %f, %f, %f\n", camera_translation_vector.at<double>(0), camera_translation_vector.at<double>(1), camera_translation_vector.at<double>(2));
printf ("Camera pose %f, %f, %f\n", camera_rotation_vector.at<double>(0), camera_rotation_vector.at<double>(1), camera_rotation_vector.at<double>(2));
}

The pixel coordinates I used in my test are from a real image that was taken about 27 feet left of the target rectangle (which is 62 inches wide and 20 inches high), at about a 45 degree angle. The output is not what I'm expecting. What am I doing wrong?

Rotation vector
2.7005
0.0328
0.4590

Translation vector
-10.4774
8.1194
13.9423

Camera position -28.293855, 21.926176, 37.650714
Camera pose -2.700470, -0.032770, -0.459009

Will it be a problem if my world coordinates have the Y axis inverted from that of OpenCV's screen Y axis? (the origin of my coordinate system is on the floor to the left of the target, while OpenCV's orgin is the top left of the screen).

What units is the pose in?

BCat
  • 101
  • 1
  • 1
  • 4

2 Answers2

10

You get the translation and rotation vectors from solvePnP, which are telling where is the object in camera's coordinates. You need to get an inverse transform.

The transform camera -> object can be written as a matrix [R T;0 1] for homogeneous coordinates. The inverse of this matrix would be, using it's special properties, [R^t -R^t*T;0 1] where R^t is R transposed. You can get R matrix from Rodrigues transform. This way You get the translation vector and rotation matrix for transformation object->camera coordiantes.

If You know where the object lays in the world coordinates You can use the world->object transform * object->camera transform matrix to extract cameras translation and pose.

The pose is described either by single vector or by the R matrix, You surely will find it in Your book. If it's "Learning OpenCV" You will find it on pages 401 - 402 :)

Looking at Your code, You need to do something like this

    cv::Mat R;
    cv::Rodrigues(rotation_vector, R);

    cv::Mat cameraRotationVector;

    cv::Rodrigues(R.t(),cameraRotationVector);

    cv::Mat cameraTranslationVector = -R.t()*translation_vector;

cameraTranslationVector contains camera coordinates. cameraRotationVector contains camera pose.

morynicz
  • 2,322
  • 2
  • 20
  • 34
  • I'm not sure I follow you. If R is a 3x3 matrix (from rotation vector output of solvePnP passed to Rodrigues) and it is concatenated with T, which is the translation vector (output from solvePnP), that would result in a 3x4 camera->object matrix. Then I use cv::transpose as you showed to invert the matrix, giving me the object->camera transform matrix. Where does the world->object transform matrix come from? Do I use projectPoints or findHomography? I then multiply it by the object->camera matrix computed earlier. How do I extract the camera's translation and pose from the resulting matrix? – BCat Apr 01 '13 at 01:01
  • 1
    R is 3x3 rotation matrix, T is 3x1 translation vector. If You put zeros below R and 1 below T You get a 4x4 transformation matrix in homogeneous coordinates (3 dims + 1 location vector). world->object transformation comes from Your assumptions and own measurements. A little bit other way to do this transform is explained on 379-380 in "Learning OpenCV" – morynicz Apr 01 '13 at 07:04
  • 1
    I think I understand 379-380, which describes the translation from object to camera coordinates. I'm still hazy on the "world->object" transform. Is that just the object's world coordinates? Is there some code you can point me to that calculates the camera's pose and XYZ position? Between being new to the OpenCV library and new to the transformation concepts, this is really kicking my butt! – BCat Apr 02 '13 at 04:35
  • 1
    Yep, that objects coordinates written down ina matrix (sorry for not being clear). On the subject of calculation camera's position and all, [Aruco](http://sourceforge.net/projects/aruco/) is an augmented reality library, which does this all the time for it's markers. Take a look on how they are doing it. – morynicz Apr 02 '13 at 09:09
  • I've written a standalone program that I believe follows your advice, but I must have misunderstood something along the way. Any ideas? – BCat Apr 03 '13 at 08:41
  • How did You do the camera calibration? I believe the units returned are the same units You used to tell the size of the calibration object. Also, it would help to know what was the translation vector You got from the solvePnP function. The translation length shouldn't change, so this way You could track where the bug is. – morynicz Apr 03 '13 at 09:06
  • I used the OpenCV Calibration program, and the OpenCV checkerboard. By "units", do you mean the value for "square_size" in the camera intrinsics XML file? That was set to 1.0. I updated the code to more closely follow the code example that you provided (I didn't see it originally). By the way, thank you very much for your help! – BCat Apr 03 '13 at 09:19
  • I made a mistake in my code, use `camera_translation_vector = -rotation_matrix.t() * t;` Use the actual square size in units You use (and even better, convert to metric system ;)), because right now I think the units might mix up. – morynicz Apr 03 '13 at 09:31
  • I'd LOVE it if my idiot country would switch to the metric system, but I have to use imperial units on this project. I wasn't reading the square_size from the camera intrinsics file. I just measured the square on the chessboard, and it is (shudder) 15/16 inches or 0.078125 feet. So, I should use that as a scale factor for the camera_translation_vector? – BCat Apr 03 '13 at 09:46
  • I think your correction (above) isn't right, because T (the transpose of translation_vector) is a 3x1 matrix, and it would have to be a 1x3 in order to multiply it with the rotation matrix. It has to be `camera_translation_vector = -rotation_matrix.t () * translation_vector;' Right? That will compile/run, but the numbers it gives (4.8, 1.5, 18.55) don't make much sense; the Z distance should be closer to 27 – BCat Apr 03 '13 at 10:21
  • T should be 3 rows 1 column. Beyond that there is nothing I can tell, because it should work. Probably You would need to use this scale factor. – morynicz Apr 03 '13 at 11:11
  • OK, thank you. I'll go through and verify everything and let you know – BCat Apr 03 '13 at 11:49
  • @morynicz hi, great answer help me a lot. but have you success in porting Aruco AR to Unity3D? I play around Aruco with Ogre a while, but putting the position and orientation vector in Unity3D always gives me wrong object rotation. Thanks! – flankechen Sep 09 '15 at 03:54
  • @flankechen I played some with Aruco but never did anything more than check demos. Comments are not the place for chats like this. – morynicz Sep 11 '15 at 16:03
  • With regard to BCat's [comment](http://stackoverflow.com/questions/15727527/how-to-determine-world-coordinates-of-a-camera#comment22441211_15728961); R is 3x3, R^t is 3x3 , T is 3x1 , So R^t * T is (3x3)*(3x1) --> 3x1 , as required for the translation vector from world/object coordinates to camera coordinates. – WillC Feb 25 '16 at 08:58
-1

It took me forever to understand it, but the pose meaning is the rotation over each axes - x,y,z. It is in radians. The values are between Pie to minus Pie (-3.14 - 3.14)

Edit: I've might been mistaken. I read that the pose is the vector which indicates the direction of the camera, and the length of the vector indicates how much to rotate the camera around that vector.

Amit Klein
  • 124
  • 1
  • 7
  • 2
    The rotation vector you get from the pose (`solvePnP`) is the compact representation of a Rodrigues rotation matrix. They are NOT Euler angles (ex: pitch, yaw, roll). Use `Rodrigues`[1] to get the rotation matrix. If you must, you can convert between a Rodrigues matrix and Euler angles, see [2]. [1] http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#rodrigues [2] http://www.staff.city.ac.uk/~sbbh653/publications/euler.pdf – Fraser Harris Apr 08 '16 at 18:00