Here is an easier solution, since you already have the 3x3 rotation matrices R1 and R2, and the 3x1 translation vectors t1 and t2.
These express the motion from the world coordinate frame to each camera, i.e. are the matrices such that, if p is a point expressed in world coordinate frame, then the same point expressed in, say, camera 1 frame is p1 = R1 * p + t1.
The motion from camera 1 to 2 is then simply the composition of (a) the motion FROM camera 1 TO the world frame, and (b) of the motion FROM the world frame TO camera 2. You can easily compute this composition as follows:
- Form the 4x4 roto-translation matrices Qw1 = [R1 t1] and Qw2 = [ R2 t2 ], both with the 4th row equal to [0 0 0 1]. These matrices completely express the roto-translation FROM the world coordinate frame TO camera 1 and 2 respectively.
- The motion FROM camera 1 TO the world frame is simply Q1w = inv(Qw1). Here inv() is the algebraic inverse matrix, i.e. the one such that inv(X) * X = X * inv(X) = IdentityMatrix, for every nonsingular matrix X.
- The roto-translation from camera 1 to 2 is then Q12 = Q1w * Qw2, and viceversa, the one from camera 2 to 1 is Q21 = Q2w * Qw1 = inv(Qw2) * Qw1.
Once you have Q12 you can extract from it the rotation and translation parts, if you so wish, respectively from its upper 3x3 submatrix and right 3x1 sub-column.