2

I have the 3D coordinates of 4 coplanar points of my target in the object coordinate system.I also have their the 2D coordinates in every frame of a video.I have also calculated the intrinsic parameters (M) for the camera, the R (rotation) and t (translation) matrices between the object coordinate system and the camera coordinate system using solvepnp(). I have read from here the complete process,which is very clear.It is also similar to the process I followed.Therefore I wanted to use the same equation


s [u v 1]T = M ( R [X Y Z]T + t)

for calculating my 3D coordinates but I have no constant as the link explains for calculating s.My target rotates about the x axis in the OpenCV coordinate system.My questions are -

  1. Can anyone suggest me a way to find s? Is it definitely mandatory for this calculation or can i use s=1?
  2. Is there any other methods for calculating the 3d point with what parameters I have?


Community
  • 1
  • 1
user2958957
  • 61
  • 1
  • 8

4 Answers4

1

The variable s has a specific meaning: there is one-to-many correspondences between a 2D point and its 3D back-projection. In other words, there is an infinite number of possible 3D points lying on a ray that is eventually terminates in or is emitted from a pixle in the direction u, v, f. This is what s is about: just an indicator of a one-to-many relationship.

It seems that Francesco talks about a general case of structure from motion when a metric reconstruction is ambiguous up to scale. The question however is probably quite different. Let me rephrase it and tell me if I got it right: you have a static object coordinate system which you know. You have a target that rotates in this system around X axis and you know 3d coordinates of 4 points in this system at zero rotation. To get new 3D coordinates after rotation all you need is a rotation angle while you are given a set of 2D projections of your known points. This is an EASY task; if it is what you are really after.

Why the task is easy? Every point generates two constraints as in u= v=; the number of unknown is one - the angle so one point is enough to calculate it. Knowing this angle you can rotate your known 3D points to update their coordinates. Overall only 1 point is enough to solve the task:

  1. Multiply both sides of a pin-hole camera equation with an inverse of an intrinsic matrix from the left to get rid of intrinsic parameters. You will end up with this: s’ [u’ v’ 1]T = A [X Y Z]T + t, where A=R*Ralpha
  2. Technically Ralpha - our unknown - depends on angle alpha only but since the dependence is non linear we can use a linear multiplication by a matrix with 2 entries: s = sin(alpha) and c = cos(alpha), alpha - angle for rotation around x axis

              1  0   0
    Ralpha =  0  c  -s
              0  s   c
    
  3. Get rid of s’ by noting that s' = a31X + a32Y + a32Z + tz and plugging it in the two constraints:

    s’u’ = (a31X + a32Y + a32Z + tz)u’ = a11X + a12Y + a13Z + tx

    s’v’ = (a31X + a32Y + a32Z + tz)v’ = a21X + a22Y + a23Z + ty

Finding matrix A is now a simple task of solving a linear system of equations Kx=b, where by rearranging terms we have

b = [tx-tzu’, ty-tzv’]T,

x = [a11, a12, a13, a21, a22, a23, a31, a32, a33]T

for a single point correspondence K is

-X, -Y, -Z, 0,   0,  0, Xu’, Yu’, Zu’
 0,  0,  0, -X, -Y, -Z, Xv’, Yv’, Zv’

But one can add more rows if there are more correspondences. Solving this with pseudo inverse gives x = (KTK)-1KTb, which can be further optimized via non-linear minimization of quadratic residuals.

After you calculated x and used it to reassemble A, you have to make sure that it is a true rotation matrix. Normally this is done through SVD: A=ULVT and then reassigning A=UVT. Finally, get Ralpha = RTA, which gives you a rotation matrix that you can apply to your known 3D coordinates to get their new values in the object coordinate system or use the whole matrix A to get them in the camera coordinate system.

This may look messy but it is a typical set of steps for getting, say, extrinsic camera parameters and you have already done this (though you probably used a library function).

Vlad
  • 4,425
  • 1
  • 30
  • 39
  • Yep, what Vlad says (I too missed the bit about the OP having actual 3D points). There are a few caveats: 1 - I can hardly believe you have a pure rotation about the calibrated axis, unless your rig is specifically manufactured and calibrated that way (i.e. with an OpenCV calibration target that has been carefully aligned with the mechanical axis of rotation). Of course this does not apply if your world coordinate system is instead inferred from a one-axis rotation (i.e. by moving the calibration target about that axis during camera calibration, and fitting afterwords). – Francesco Callari Feb 27 '14 at 20:22
  • 2 - From 1, it follows that you need to fit a full 3-axis rotation to your plane. – Francesco Callari Feb 27 '14 at 20:22
  • 3 - All of this is perfectly irrelevant anyway. Since the world-to-camera transformation for your 3D points is a homography, AND you know the physical object size on the plane, you don't need to use a calibrated setup: just compute the homography, apply it to the 3D points, then scale so that the width/height ratio of your object is preserved. – Francesco Callari Feb 27 '14 at 20:24
  • Good point Francesco. One has to know the intrinsic matrix though to decompose the Homography and get 3D points: M*R|T = H – Vlad Feb 28 '14 at 01:15
  • Thanks for all your replies and clarifications.I am a beginner to all these concepts so it will take me some time to understand what you mean completely.I did try the homography decomposition though but wasnt able to get any good results with it.Thats why i used solvepnp() to find the R and t matrices. – user2958957 Feb 28 '14 at 06:03
  • However if i do have a homography decomposed R and t,can u elaborate on how i can find the 3D coordinates using those matrices?Like an equation i need to solve,that would be helpful. – user2958957 Feb 28 '14 at 06:12
  • @FrancescoCallari -yes i am fixing the world coordinate system so that my target has only one axis of rotation. can u please elaborate what you mean by "and fitting afterwards"? And can you also explain what you mean about scaling so that the width/height ratio is preserved? – user2958957 Feb 28 '14 at 06:34
  • I am trying to make sense of Homography use too and all I can figure out is this: 1. Let’s say that originally at zero rotation you calculated homography H1 = M and after rotation you have H2 = M*R, where H1 maps plane coordinates to the image1 and H2 maps same zero rotations plane coordinate to the image2 of rotated pattern. Then we have M=H1=H2*R.inv so R = H1.inv*H2. You should have both these homographs, so you can calculate R and apply it to your 3D points. Note that homography gives you only the first two columns of R but the third one is easy to calculate as r3=r1xr2. – Vlad Feb 28 '14 at 06:45
  • Two quesions - 1. Does that M got anything to do with the instrinsic matrix? 2.H is a 3x3 matrix.which 2 rows or columns should i consider as r1 and r2? – user2958957 Feb 28 '14 at 11:06
  • RE coordinate system. The problem I was referring to is that it is relatively easy to make a rig that will rotate only about one axis without wobbling (say, keeping precession within a small fraction of a degree): you only need to machine a rigid metal rod and two pins. It's a bit harder to align a camera calibration target (say, a checkerboard) so that one of its visible "natural" axes (one particular and recognizable line on it) will spin precisely ON the mechanical axis of the rig. Errors may be quite visible at your camera resolution, depending on your setup. – Francesco Callari Feb 28 '14 at 19:21
  • So, by "fitting afterwards", I was referring to a technique whereby the target is assumed to be rotating about an a-priori unknown axis in the world (the mechanical axis of your rig). So you calibrate, and get the camera intrinsics K, and a set S = {[R, T]i, i = 1..n} of pose matrices at every image with respect to the camera frame. You then model the motion represented by S as a single rotation about the unknown axis a at distance d from the camera, and compute it with a separate optimization step. – Francesco Callari Feb 28 '14 at 19:26
  • I used your notation so M is intrinsic matrix. I will describe Homography in the separate answer since then I can use HTML tags for my notation. – Vlad Feb 28 '14 at 21:45
1

EDIT: Vlad probably addressed your problem most accurately. The following may still help to clarify the maths in the general case.


If you know R and t, then your problem can be reduced to estimating (X0,Y0,Z0) from the following equation, where M, u and v are known:

su,v [u v 1]T = M [X0 Y0 Z0]T

Notice that su,v is not a constant factor but depends on u and v. Due to the special form of M, which is diagonal with last element equal to 1, we can easily see that su,v=Z0. Hence, if you only know M, R, t, u and v, you can only estimate (X0/Z0,Y0/Z0,1). This means that you cannot estimate the relative depth between two different image points (they all have depth equal to one), hence you do not obtain a real 3D reconstruction.

In order to estimate the relative depth of two image points, you need to have at least two observations of the same point in two images (acquired by cameras with different positions). And, as pointed out by Francesco, even if you have two images, you cannot estimate the true scale of your reconstructed scene, unless you additionnally know the true 3D distance D between two points.

BConic
  • 8,750
  • 2
  • 29
  • 55
  • It seems that almost everybody misses the point in the first sentence of the question: “I have 3D coordinates of 4 coplanar points of my target in the object coordinate system”. This is not a general structure from motion task. – Vlad Feb 27 '14 at 08:35
  • @Vlad Yes, I saw your answer after posting mine and you are probably right, I +1'd your post. However, if the OP is working with only one fixed camera, my post may clarify things a bit for him... – BConic Feb 27 '14 at 08:38
0

Only if you know absolute scale, for example the length of a known object visible in the image.

But do you really care for absolute scale and distance? You should think long and hard about your application, then decide whether you really need to know physical distances and sizes. It's quite hard to get it right, especially if a significant degree of accuracy is required, and most often it is not necessary.

Francesco Callari
  • 11,300
  • 2
  • 25
  • 40
  • yes i do know the length of the target am detecting but as i said because my object rotates about the x axis , won't the rotation affect the length? And yes,i do need to measure the physical distance for sure. – user2958957 Feb 27 '14 at 05:28
0

To decompose the homography matrix (that maps a plane in the world to its image) onto rotation and translation follow these 5 steps:
1. Note that M*[R|T]*[x, y, 0, 1]T can be simplified by getting rid of a place holder for Z coordinate. This is equivalent to positioning an object coordinate system on the plane in such a way that Z=0.
2. Note that this effectively kills a third column in the rotation matrix so that a pin-hole camera equation looks like homography:
s[u, v, 1]T = M * R|T * [x, y, 1] = H * [x, y, 1], where R|T=
r11 r12 tx
r21 r22 ty
r31 r32 tz
3. Now incorporate the intrinsic matrix M in the Homography:
M * R|T = H, then R|T = M-1H = H2. Decompose H2 using SVD: H2=ULVT then substitute L that represent scaling with W that makes scaling uniform and unit as in true rotation; R|T then become UWVT, where W =
1 0
0 1
0 0
4. Calculate a third column of R via the vector product r3=r1xr2 to guarantee that it is orthogonal to the previous two and R is a true rotation matrix.
5. You can optionally recover the translation T by factoring in a scaling factor that is k=sum(Rij/H2ij)/6 where i=1..3, j=1..2 and then calculating T = k* the third column of H2. Finally if Tz<0 inverse R and T.

Vlad
  • 4,425
  • 1
  • 30
  • 39