I'm currently working on a project in which I have to estimate 3D coordinates of 2D interest points detected using a monocular camera.
To be more precise, I have in input an image sequence (calibrated) and it is required, when receiving a new image, to triangulate points between the left (previous) image and the right current one to get 3D points.
To do this, I'm following these steps:
- Extracting key-points in the current image
- Establishing correspondences between the current and the previous image
- Computing the Essential Matrix E using RANSAC and the height-point algorithm
- Extracting the transformation matrix R and the translation vector T from E
- Computing the 3D points using triangulation via orthogonal regression
The resulting 3D points are not correct when I reproject them on the images. But, I have read that the triangulated points are defined to only up to an indeterminant scale factor.
So my question is: What does "up to scale" means in this context? And what is the solution to get the real 3D points in the scene's world coordinate frame?
I would be thankful for any help!