4

I have one Kinect camera and one webcam, I'm trying to find the rotation/translation matrix between the Kinect and the webcam using OpenCV. Here is the setup:

setup

The two cameras are facing towards the same direction. I can get the intrinsic matrix for both cameras but I'm not sure how to get the relative position between them?

I made some researches and found the findEssentialMat() function. Apparently it returns an essential matrix (but this function seems not suitable since it assumes that the focal and principle point are the same in both cameras), which can be used with:

  1. recoverPose()
  2. decomposeEssentialMat() -> if I understood, it will return 4 different solutions, should I use this function ?

Thank you very much !

EDIT: How about the stereoCalibrate() function ? But my setup does not really correspond to a stereo camera..

EDIT2: I gave a try with the "stereo_calib.cpp" example provided with openCV. Here is my result, I don't really know how to interpret it ?

enter image description here

Also, it produces an "extrinsics.yml" file where I can find the R and T matrices but I don't know in which units they are represented ? I changed the squareSize variable in the source code many times but it seems the matrices are not changed at all.

Gustanas
  • 406
  • 3
  • 13
  • Hi ! Thank you all for your replies, I'm currently busy preparing an interview for a job. Next week, I'll make some tests and give you all some feedback. Thanks for helping !! – Gustanas Mar 04 '14 at 12:41

3 Answers3

3

Use stereoCalibrate. Your setup is exactly like a stereo camera.

Francesco Callari
  • 11,300
  • 2
  • 25
  • 40
  • Thank you ! I tried it but I don't really understand the units used for the Rotation/translation matrices. I edited my question – Gustanas Feb 26 '14 at 23:15
  • The images you posted are rectified stereo pairs - note that corresponding checkerboard corners are near the same vertical line. However, you may want to iterate using more image pairs, since those correspondences don't look very accurate (hard to tell given the size of the images you posted). The rotation matrix R has no scale/units - the columns are unit vectors. IIRC the yml format is defines so that the translation vector matrix T is scaled taking the width of one square of the calibration grid as 1, so you just need to multiply by the actua physical width of your target. – Francesco Callari Feb 27 '14 at 02:38
3

I think that stereoCalibrate is the way to work if you are interested in the depth map and in aligning the 2 images (and I think this is an important issue even if I don't know what you're trying to do and even if you're already have a depth map from the kinect).

But, If I understand it correctly what you need you also want to find the position of the cameras in the world. You can do that by having the same known geometry in both view. This is normally achieved via a chessboard pattern that is lying in the floor, send by both (fixed position) cameras.

Once you have a known geometry 3d points and the correspective 2d points projected in the image plane you can find independently the 3d position of the camera relative to the 3d world considering the world starting in one edge of the chessboard.

In this way what you're going to achieve is something like this image:

enter image description here

To find the 3d position of the camera relative to the chessboards you can use the cv::solvePnP to find the extrinsic matrix for each camera independently. The are some issues about the direction of the camera (the ray pointing from the camera to the origin world) and you have to handle them (the same: independently for each camera) if you want to visualise them (like in OpenGL). Some matrix algebra and angle handling too.

For a detailed description of the math I can address you to the famous Multiple View Geometry.

See also my previous answer on augmented reality and integration between OpenCV and OpenGL (i.e. hot to use the extrinsic matrix and T and R matrixes that can be decomposed from it and that represent position and orientation of the camera in the world).

Just for curiosity: why are you using a normal camera PLUS a kinect? The kinect gives you the depth map that we are try to achieve with 2 stereo camera. I don't understand exactly what kind of data an additional normal camera can give you more then a calibrated kinect with good use of the extrinsic matrix already gives you.

PS the image is taken from this nice OpenCV introductory blog but I think that post is not much relevant to your question because that post is about intrisinc matrix and distortion parameters that seems you already have. Just to clarify.

EDIT: when you're talking about units of the extrinsic data you are normally measure them in the same unit of the 3D points of the chessboard are, so if you identify a squared chessboard edge points in 3D with P(0,0) P(1,0) P(1,1) P(0,1) and use them with solvePnP the translation of the camera will be measured in the unit of "chessboard edge size". If it is 1 meter long, the unit of measure will be meters. For the rotations, the unit are normally angles in radians, but it depends how you are extracting them with the cv::Rodrigues and how you're getting the 3 angles yawn-pitch-roll from a rotation matrix.

Community
  • 1
  • 1
nkint
  • 11,513
  • 31
  • 103
  • 174
  • Hi, sorry for the late reply. I use another camera because my goal is to map the pictures taken by my webcam to the 3D surface created by the kinect. I do this, because the webcam will further be replaced by an infrared camera (and we are afraid of interferences between the infrared camera and the kinect). If I understood your answer, solvePnp will give me the rvec and tvec for the 2 cameras, once I have them I can find the relative transformation ? Also, during the calibrating step, does the size I give to my squares matter ? (since you said that the units will change if I modify P(0,0)..etc) – Gustanas Mar 17 '14 at 20:03
  • yes, you correctly understood the solvePnP thing but remember to use cv::Rodrigues. then yes, the size of the squares matters! does this answer the question? – nkint Mar 18 '14 at 08:44
  • Yes it does, you were really helpful thank you so much. I think I understand now, Rodrigues will give me the rotation matrix. As for the translation matrix, should I use tvec without changing anything ? Or should I use this: cameraPosition = -np.matrix(rotM).T * np.matrix(tvec) (I saw this from your question here: http://stackoverflow.com/questions/18637494/camera-position-in-world-coordinate-from-cvsolvepnp) – Gustanas Mar 18 '14 at 14:51
  • it depends on your coordinate system. see here: http://ksimek.github.io/2012/08/14/decompose/ – nkint Mar 18 '14 at 16:52
0

Just put Kinect behind your web camera. Kinect will give you the translation of the web camera from its depth map. The relate rotation can be calculated by Kinect from the plane rigidly attached to a web camera. This will work if you don’t care too much about the accuracy and I assume that stereo in this case is irrelevant since Kinect gives you depth map already.

In case you need more accurate results you need to specify your goal. For example, the goal of stereo calibrate is to produce two homography matrices that can be applied to each camera images in order to rectify them or in other words make pixel correspondences lie in the same column (for your set up). This simplifies the search for stereo matches. What is your goal?

Vlad
  • 4,425
  • 1
  • 30
  • 39