How to find World coordinates from image pixel coordinates

Question

I am working on a project involving an aircraft equipped with a camera. The objective is to detect circular and color tagged targets placed on the ground and find their GPS Coordinates from an image. The following variables are known:

The camera is fixed to the plane, facing directly down when aircraft attitude (pitch/roll) is zero. The camera is oriented so that the image Y axis is oriented towards the nose of the aircraft.
Aircraft altitude (from a barometer), pitch and roll (accelerometer readings), as well as bearing (from magnetometer readings).
The target radius is known and constant, but If possible I'd rather not rely on this for the algorithm to work.
Pitch and roll are not always zero. I have already found a working solution for when the aircraft is in horizontal flight with zero banking and elevation.
Assume the ground is flat.

So I'm basically trying to generalize my solution in order to account for the aircraft pitch and roll.

I've taken a look at the pinhole camera model and even tried to derive my own solution based on trigonometry, but I feel like there should be a more elegant solution to this problem.

The pinhole camera model gives the following formula to translate world coordinates to image coordinates (pixels):

The pinhole camera model formula

Based on this formula, it is impossible to convert image coordinates to world coordinates as depth information is lost (also the extrinsic parameters matrix is not invertible). However, we do know the aircraft's altitude and we know that the ground is horizontal and flat, so we can set Z = 0, which means we no longer need the third column of the extrinsic parameters matrix, which suddenly becomes invertible. Thus, we can send it to the left and isolate the XY coordinates. However, the extrinsic parameters matrix is dependent on the camera orientation and position, and seems to only be derived using calibration chessboard patterns and thus i cannot use this method in my use case... Am i missing something?

Otherwise, my trigonometric solution is described in this drawing: Side view of the camera FOV with image plane, camera plane and ground plane where d is the distance on the ground from the camera (and the aircraft). Does this solution seem reasonable? I will attempt to test it soon...

Use a digital terrain image registered to your world coordinates. Send a ray from the camera through the image towards the ground. Iterated until the ray intersects the ground. — fmw42, Aug 27 '23 at 21:02
What's a digital terrain image? I looked it up and i don't think i see what you mean — DrCube, Aug 28 '23 at 17:27
If you assume the location of your targets are on the ground (Z=0) in world coordinates, then you would not need a terrain map. If the ground is not flat and you have them spaced far apart, then a terrain map would be need to know or find the elevation of the target. — fmw42, Aug 28 '23 at 17:41
If you assume the ground is flat at Z=0, then you do not need to iterate to get the location of your targets in 3D from your 2D coordinates. Searching Google, I find https://stackoverflow.com/questions/51272055/opencv-unproject-2d-points-to-3d-with-known-depth-z. Is that what you want? — fmw42, Aug 28 '23 at 17:51

How to find World coordinates from image pixel coordinates

0 Answers0