Calculating world coordinates of a pixel from a camera picture

Question

Sorry for this question. I know there are many similar questions but I really know nothing about math that involves this case (I'm not a 3D programmer) and those answers are very obscure to me, so I can't implement any.

Also I asked here, but have no audience and I regretted but don't know how to delete there and keep it here. I know I must not do any cross site question on the SE network.

I have a picture taken from a camera. I know some information about this camera and this picture:

Camera:

* FOV       : 61 deg.
* Altitude  : 91 meters
* Long.     : -43.17687898427574
* Lat.      : -22.89925324277859
* Azimuth   : 109 deg.
* Pitch     : 91 deg. ( 1 degree below the horizon )
* Far       : 10000000000
* Near      : 0.1
* FOV_Y     : 0.9522258776618419

These numbers came from Cesium. I have no access to the camera specifications so they can be with approximate values. However, Cesium was extremely accurate in calculating the actual pixel coordinates of the image, so I think these numbers are pretty close to reality.

Picture:

* Height    : 485px
* Width     : 859px

Now, I need to know the real world geographic coordinates of a specific pixel in this image.

I've made a test with Cesium, and I was able to check some results to serve as test parameters:

* Pixel at 327 x 223 (w/h)
* Real world coordinates of that pixel: 
    Lat: -22.899739635348244
    Lon: -43.16761414545814

Cesium is a very good tool to do this job, but I need some backend resource like Java or Python and can't find any ( maybe due to my math limitations I can't determine what works and what doesn't ).

I found a material I think is very close to what I need, but can't implement it because it is very theoretical: https://medium.com/swlh/ray-tracing-from-scratch-in-python-41670e6a96f9

Here is the picture from the test case. The red arrow is where the pixel is (approximately, of course. I'm not so precise with the mouse). Have in mind that this picture is a screenshot and may not have same dimensions that the real one I have.:

... and this is the screenshot of my Cesium map to the same area ( but in 3D view ):

... and 2D view ( the red mark is the point used in the test case. The Blue mark is where the camera is):

Resuming, I need to take the geographic coordinates of a pixel from a picture using Python or Java ( or any agnostic math pseudo code for laymen that I could implement ).

EDIT:

Well... I have some help from ChatGPT and now I have this code in Python (sorry for PT_BR comments):

import numpy as np

# FOV       : 61 deg.
# Altitude  : 91 meters
# Long.     : -43.17687898427574
# Lat.      : -22.89925324277859
# Azimuth   : 109 deg.
# Pitch     : 91 deg. ( 1 degree below the horizon )

# Width     : 859.358px
# Height    : 484.983px

# Pixel at 850.1561889648438 x 475.18054962158203 (w/h)

camera_fov = 61            # FOV
image_width = 859.358      # largura da imagem em pixels
image_height = 484.983     # altura da imagem em pixels
camera_height = 90.0       # altura da câmera em metros
pitch_angle = 91.0         # ângulo de inclinação da câmera em graus
yaw_angle = 109.0          # ângulo de guinada da câmera em graus
camera_position = np.array([-43.17687898427574, -22.89925324277859, camera_height])  # posição da câmera no sistema de coordenadas do mundo

# coordenadas 2D do pixel na imagem ( u=w  v=h)
u = 850.1561889648438     # w
v = 475.18054962158203    # h

# cálculo das coordenadas 3D do pixel no espaço do mundo
aspect_ratio = image_width / image_height
fov = np.radians( camera_fov ) # campo de visão em radianos
y_fov = 2 * np.arctan(np.tan(fov / 2) * np.cos(np.radians(pitch_angle)))  # campo de visão vertical em radianos
y_angle = np.radians((v / image_height - 0.5) * y_fov * 180 / np.pi + pitch_angle)  # ângulo vertical em radianos
x_fov = 2 * np.arctan(np.tan(fov / 2) * np.cos(y_angle))  # campo de visão horizontal em radianos
x_angle = np.radians((u / image_width - 0.5) * x_fov * 180 / np.pi)  # ângulo horizontal em radianos
direction = np.array([-np.sin(x_angle), np.cos(x_angle) * np.sin(y_angle), np.cos(x_angle) * np.cos(y_angle)])
rotation_matrix = np.array([
    [np.cos(np.radians(yaw_angle)), -np.sin(np.radians(yaw_angle)), 0],
    [np.sin(np.radians(yaw_angle)), np.cos(np.radians(yaw_angle)), 0],
    [0, 0, 1]
])

direction = rotation_matrix @ direction
world_coords = camera_position + direction * camera_height / np.tan(y_angle)

# Real world coordinates of that pixel: 
#    Lat: -22.901614465334827
#    Lon: -43.17465101364515


print(world_coords[1],"/",world_coords[0] )

I've found the numbers very close to what Cesium gave me, then I've made some tests. First I found the pixel near the horizon. All pixels in height less than it will be sky. Then I take all four pixels from horizon to the image bottom:

The result is ( pixel pos ) LAT , LON : ( coordinates calculated by the code )

p1 : (   1,264)  -22.40093040092381 , -41.77409577136955
p2 : ( 858,254)  -22.42828595321902 , -41.76467637515944
p3 : (   1,481)  -22.68142494320101 , -42.55301747856779 
p4 : ( 858,481)  -22.68681453552334 , -42.55116168224547

Now I need to check these coordinates the code gave against those the Cesium calculated to see if the results are the same.

Don't, but was close.

I'm curious about what coordinates the code are giving me, so I take a WKT Polygon (GeoJSON) and use the site geojson.io to draw it.

The gray area is the polygon representing the "viewport" ( to the horizon ) that I made from the camera picture ( p1 to p4 ) and calculated by the code.

The red arrow is where the camera is.

The Blue triangle is what I think the camera picture are actualy seeing.

As you can see I'm very close. I also suspect that input values such as pitch and yaw (azimuth) orientation may be different than what Cesium uses. Any help to check the math in the code will be wellcome!!

I can give the ipynb file if you want to try it on Jupyter.

EDIT 2: New Thoughts

I decided to loop using variations of YAW and PITCH to see where the points would be plotted. I also decided to add a map library to facilitate direct visualization in Jupyter. I've concluded that the method I'm using is pretty much correct. I did the following: I located the center point of the camera ( 429, 265 ) for the given image size ( 859, 484 ) and locked onto the YAW and PITCH variations to see where the code would point.The method used showed me that I must vary PITCH in decimals, starting from 90 degrees. A very high value causes the traced line to suffer distortions, perhaps due to the curvature of the earth, I don't know. The following image shows the result of a PITCH variation between 90 and 94, with a step of 0.1. Yaw points north.

I concluded that a value close to 94 degrees would give me a distance of about 13Km to the farthest point and 90 degress the cam will point to my foot. I will use this fixed value for PITCH from now on.

Here is the code:

coords_pitch = []
xs = (x * 0.1 for x in range(1, 40))     # Vary pitch
for x in xs:
    pitch_angle = 90 + x
    p1 = calc_position( 429, 265 );       # 429, 265 = Screen center pixel
    coords_pitch.append( [p1[0],p1[1]] )
points_geom_pitch = MultiPoint(coords_pitch)
points_pitch = gpd.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[points_geom_pitch]) 
m = folium.Map([-22.9149,-43.1330], zoom_start=13, tiles='cartodbpositron')
folium.GeoJson(points_pitch).add_to(m)

The next step would be to see how YAW responds to variations. I expected a value of 0 to point north, 90 east, 180 south, and 270 west. I found that the north-south axes were reversed, so I changed the code to subtract 180. Here is the changed main code:

def calc_position(u,v):
    yaw_angle_inv = 180 - yaw_angle
    aspect_ratio = image_width / image_height
    fov = camera_fov # campo de visão em radianos
    y_fov = 2 * np.arctan(np.tan(fov / 2) * np.cos(np.radians(pitch_angle)))  # campo de visão vertical em radianos
    y_angle = np.radians((v / image_height - 0.5) * y_fov * 180 / np.pi + pitch_angle)  # ângulo vertical em radianos
    x_fov = 2 * np.arctan(np.tan(fov / 2) * np.cos(y_angle))  # campo de visão horizontal em radianos
    x_angle = np.radians((u / image_width - 0.5) * x_fov * 180 / np.pi)  # ângulo horizontal em radianos
    direction = np.array([-np.sin(x_angle), np.cos(x_angle) * np.sin(y_angle), np.cos(x_angle) * np.cos(y_angle)])
    rotation_matrix = np.array([
        [np.cos(np.radians(yaw_angle_inv)), -np.sin(np.radians(yaw_angle_inv)), camera_height],
        [np.sin(np.radians(yaw_angle_inv)), np.cos(np.radians(yaw_angle_inv)), camera_height],
        [0, 0, 0]
    ])
    direction = rotation_matrix @ direction
    world_coords = camera_position + direction * camera_height / np.tan(y_angle)
    return world_coords

... and the code to plot YAW axis:

# Point the camera (central pixel) to my own foot to take a start point
pitch_angle = 90
yaw_angle = 0
start_point = calc_position( 429, 265 );

# Loop on YAW, using the fixed PITCH value of 94 deg. as calculated before. This will give me 360 points around the start point
pitch_angle = 94
coords_yaw = []
xs = (x * 1 for x in range(0, 360))     
for x in xs:
    yaw_angle =  0 + x 
    p1 = calc_position( 429, 265 );       
    coords_yaw.append( [p1[0],p1[1]] )

# Print on map just the 4 cardinal axis
line_geom_yaw_0 = LineString( [ [ start_point[0],start_point[1] ] , coords_yaw[ 0 ] ] )
line_geom_yaw_90 = LineString( [ [ start_point[0],start_point[1] ] , coords_yaw[ 90 ] ] )
line_geom_yaw_180 = LineString( [ [ start_point[0],start_point[1] ] , coords_yaw[ 180 ] ] )
line_geom_yaw_270 = LineString( [ [ start_point[0],start_point[1] ] , coords_yaw[ 270 ] ] )

line_yaw_0 = gpd.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[line_geom_yaw_0]) 
line_yaw_90 = gpd.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[line_geom_yaw_90]) 
line_yaw_180 = gpd.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[line_geom_yaw_180]) 
line_yaw_270 = gpd.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[line_geom_yaw_270]) 

m = folium.Map([-22.9149,-43.1330], zoom_start=13, tiles='cartodbpositron')
folium.GeoJson(line_yaw_0).add_to(m)
folium.GeoJson(line_yaw_90).add_to(m)
folium.GeoJson(line_yaw_180).add_to(m)
folium.GeoJson(line_yaw_270).add_to(m)

This is the result:

Next step: Find out why the cardinal axes are distorted and start working on the pixel variations, leaving the center of the camera in an already defined position. I have already discovered that the height of the camera is causing this distortion. A value of 0 meters causes the cardinal axes distortion to disappear. However, the values used for PITCH appear to be attenuated quite a bit and allow for larger increments without suffering distortion. As my camera is effectively 56m high, I need to check for these distortions when testing pixel variations.

Because of all the variables involved and since the camera specifications are approximations, it would be impossible to get the _exact_ location. But I think you know that. — youngson, Apr 30 '23 at 23:55
@syoung Thanks for pointing that. Yes. Anything that gives me something near ( even not so close ) will be acceptable. I'll working on vessel positions and I accept something inside a 500m square error (or even bigger. I have nothing until now). I can later submit the found coordinates to a GeoHash and supose the ship position by approximation. A ship in the camera will have many pixels anyway. — Magno C, May 01 '23 at 00:13
I think if I could determine the distance of any pixel from the camera coordinates will be a huge progress. — Magno C, May 01 '23 at 00:17
distance might be inffered from ethalons (objects with known size) present in the picture for example roads, windows, doors from there you could construct distance map (something like Depth buffer) — Spektre, May 01 '23 at 06:36
@Spektre thanks for suggestion but although I have ways to know such things and already reached this conclusion (the simplest one), I need a more flexible solution. — Magno C, May 01 '23 at 19:56
you could also do a [Ray and ellipsoid intersection](https://stackoverflow.com/q/25470493/2521214) for a rouch estimate of position (ground without elevation) similar to ray tracing On top of that you could also add DTM (terrain) data using some global DTM model (with low precision) and do a polygon ray intersection for place detected by the first method. However this would ignore buildings — Spektre, May 05 '23 at 07:01
Hi @Spektre thanks for engaging. The easy part is that I'm working with ships so I'll consider only sea level. No DTM. — Magno C, May 05 '23 at 17:46
then simple raytrace with ellipsoid (single ray) +/- sea level ofsset (of the ellipsoid rasiuses) would suffice so if you have the camera parameters and position/orientation then you can compute ray direction for the selected point of interest and compute the intersection .... check this [GLSL 3D Mesh back raytracer](https://stackoverflow.com/a/45140313/2521214) it more or less contains most of the stuff you need — Spektre, May 06 '23 at 07:20
so first I would compute the ellipsod position/orientation (from camera parameters) and render it to check if its correct and if it is exchange it with ray trace ... — Spektre, May 06 '23 at 07:23
Have you some code to point me? Is what I've done until now in the right way? — Magno C, May 08 '23 at 00:16
@MagnoC the code in my previous 2 links is all you need (as you already have camera properties -> matrix) also this [How can I render an 'atmosphere' over a rendering of the Earth in Three.js?](https://stackoverflow.com/a/19659648/2521214) is doing very similar stuff to what you intend ... however you need just single ray not the whole screen ... — Spektre, May 08 '23 at 06:16
WOW! There is a lot of code and math to read (you're the boss man!)... I'm going to need some time to digest that. I'll come back soon with the results. Thanks. — Magno C, May 08 '23 at 14:01

Calculating world coordinates of a pixel from a camera picture

0 Answers0