High-level context:
In a Unity3D AR project, using machine learning the system provides correspondence between a set of 2D pixel coordinates in an Image, and the same set of 3D points in world coordinate. From the correspondence between these two sets, I would like to estimate the camera pose that resulted into that image.
Low level goal to achieve:
The results of my research suggest I use the PnP-Ransac algorithm. Perspective-n-Points is the name of this problem i have: finding a camera pose from matching 2D-3D points. PnP problem definition: https://en.wikipedia.org/wiki/Perspective-n-Point
What I've tried
1) I tried to find a PnP solver in ARKit but I couldn't find it so I guess it is not exposed. 2) I tried using the EmguCV asset from the store which should allow me to use openCV within my Unity project. OpenCV solvePnP documentation: https://docs.opencv.org/3.3.0/d9/d0c/group__calib3d.html#ga50620f0e26e02caa2e9adc07b5fbf24e
The question:
Is there a PnP solver exposed in the ARKit framework, and if not, how do I use openCV's PnP solver correctly using the EmguCV C# wrapper within a Unity Project (coordinate systems to be aware of, correct function parameters to provide like camera intrinsic matrix, how to interpret the outputs to get the camera pose right)?
Problems I encountered trying to answer the question:
Using SolvePnPRansac led to the Unity-Editor it-self crashing even though I put it in a try-catch block (probably my input arguments had unexpected formats). I've had more success using just solvePnP, but the results are not what I expect. The documentation states that the output vectors rvec and tvec correspond to the translation and rotation bringing the object from model coordinate system to camera coordinate system. So if I put the camera to (0,0,0) looking into -z direction, having the object at tvec with euler rotations rvec, I'd expect the rendered object to be similar to the image I used for pixel-coordinate correspondence. Did I misunderstand that?
Suspicions I have: The coordinate system of openCV state that image coordinate y goes from top to bottom, while z and x remain forward-rightward. I tried inverting the 2D as well as the 3D coordinates in the y axis, but it didn't work
Edit [I removed my code here because i changed it a lot since i asked the question to get it work]
(some of many) Related Posts I looked throw the other 41 stackoverflow questions with tag opencv-solvePnP but none of them were Unity3D or c# related
got no answers
Camera pose estimation from homography or with solvePnP() function
How can I estimate the camera pose with 3d-to-2d-point-correspondences (using opencv)
difference: i need to do it in a unity3D c# project
obtaining 2d-3d point correspondences for pnp or posit
I got it, i need to use mathematical algorithms, that's the theory, but now how do I use the libraries at my disposal