What you need is what's commonly called extrinsic calibration: a rigid transformation relating the 3D camera reference frame to the 'world' reference frame. Usually, this is done by finding known 3D points in the world reference frame and their corresponding 2D projections in the image. This is what SolvePNP does.
To find the best rotation/translation between two sets of 3D points, in the sense of minimizing the root mean square error, the solution is:
- Theory: https://igl.ethz.ch/projects/ARAP/svd_rot.pdf
- Easier explanation: http://nghiaho.com/?page_id=671
- Python code (from the easier explanation site): http://nghiaho.com/uploads/code/rigid_transform_3D.py_
So, if you want to transform 3D points from the camera reference frame, do the following:
- As you proposed, define some 3D points with known position in the world reference frame, for example (but not necessarily) with Z=0. Put the coordinates in a Nx3 matrix P.
- Get the corresponding 3D points in the camera reference frame. Put them in a Nx3 matrix Q.
- From the file defined in point 3 above, call rigid_transform_3D(P, Q). This will return a 3x3 matrix R and a 3x1 vector t.
Then, for any 3D point in the world reference frame p, as a 3x1 vector, you can obtain the corresponding camera point, q with:
q = R.dot(p)+t
EDIT: answer when 3D position of points in world are unspecified
Indeed, for this procedure to work, you need to know (or better, to specify) the 3D coordinates of the points in your world reference frame. As stated in your comment, you only know the points are in a plane but don't have their coordinates in that plane.
Here is a possible solution:
Take the selected 3D points in camera reference frame, let's call them q'i.
Fit a plane to these points, for example as described in https://www.ilikebigbits.com/2015_03_04_plane_from_points.html. The result of this will be a normal vector n. To fully specify the plane, you need also to choose a point, for example the centroid (average) of q'i.
As the points surely don't perfectly lie in the plane, project them onto the plane, for example as described in: How to project a point onto a plane in 3D?. Let's call these projected points qi.
At this point you have a set of 3D points, qi, that lie on a perfect plane, which should correspond closely to the ground plane (z=0 in world coordinate frame). The coordinates are in the camera reference frame, though.
Now we need to specify an origin and the direction of the x and y axes in this ground plane. You don't seem to have any criteria for this, so an option is to arbitrarily set the origin just "below" the camera center, and align the X axis with the camera optical axis. For this:
Project the point (0,0,0) into the plane, as you did in step 4. Call this o. Project the point (0,0,1) into the plane and call it a. Compute the vector a-o, normalize it and call it i.
o is the origin of the world reference frame, and i is the X axis of the world reference frame, in camera coordinates. Call j=nxi ( cross product). j is the Y-axis and we are almost finished.
Now, obtain the X-Y coordinates of the points qi in the world reference frame, by projecting them on i and j. That is, do the dot product between each qi and i to get the X values and the dot product between each qi and j to get the Y values. The Z values are all 0. Call these X, Y, 0 coordinates pi.
Use these values of pi and qi to estimate R and t, as in the first part of the answer!
Maybe there is a simpler solution. Also, I haven't tested this, but I think it should work. Hope this helps.