The variable s has a specific meaning: there is one-to-many correspondences between a 2D point and its 3D back-projection. In other words, there is an infinite number of possible 3D points lying on a ray that is eventually terminates in or is emitted from a pixle in the direction u, v, f. This is what s is about: just an indicator of a one-to-many relationship.
It seems that Francesco talks about a general case of structure from motion when a metric reconstruction is ambiguous up to scale. The question however is probably quite different. Let me rephrase it and tell me if I got it right: you have a static object coordinate system which you know. You have a target that rotates in this system around X axis and you know 3d coordinates of 4 points in this system at zero rotation. To get new 3D coordinates after rotation all you need is a rotation angle while you are given a set of 2D projections of your known points. This is an EASY task; if it is what you are really after.
Why the task is easy? Every point generates two constraints as in u= v=; the number of unknown is one - the angle so one point is enough to calculate it. Knowing this angle you can rotate your known 3D points to update their coordinates. Overall only 1 point is enough to solve the task:
- Multiply both sides of a pin-hole camera equation with an inverse of an intrinsic matrix from the left to get rid of intrinsic parameters. You will end up with this:
s’ [u’ v’ 1]T = A [X Y Z]T + t, where A=R*Ralpha
Technically Ralpha - our unknown - depends on angle alpha only but since the dependence is non linear we can use a linear multiplication by a matrix with 2 entries: s = sin(alpha) and c = cos(alpha), alpha - angle for rotation around x axis
1 0 0
Ralpha = 0 c -s
0 s c
Get rid of s’ by noting that s' = a31X + a32Y + a32Z + tz
and plugging it in the two constraints:
s’u’ = (a31X + a32Y + a32Z + tz)u’ = a11X + a12Y + a13Z + tx
s’v’ = (a31X + a32Y + a32Z + tz)v’ = a21X + a22Y + a23Z + ty
Finding matrix A is now a simple task of solving a linear system of equations Kx=b, where by rearranging terms we have
b = [tx-tzu’, ty-tzv’]T,
x = [a11, a12, a13, a21, a22, a23, a31, a32, a33]T
for a single point correspondence K is
-X, -Y, -Z, 0, 0, 0, Xu’, Yu’, Zu’
0, 0, 0, -X, -Y, -Z, Xv’, Yv’, Zv’
But one can add more rows if there are more correspondences.
Solving this with pseudo inverse gives x = (KTK)-1KTb,
which can be further optimized via non-linear minimization of quadratic residuals.
After you calculated x and used it to reassemble A, you have to make sure that it is a true rotation matrix. Normally this is done through SVD: A=ULVT and then reassigning A=UVT. Finally, get Ralpha = RTA, which gives you a rotation matrix that you can apply to your known 3D coordinates to get their new values in the object coordinate system or use the whole matrix A to get them in the camera coordinate system.
This may look messy but it is a typical set of steps for getting, say, extrinsic camera parameters and you have already done this (though you probably used a library function).