How do you calculate spherical camera position from point correspondence?

Question

I have 4 points marked in an equirectangular image. [Red dots]

I also have the 4 corresponding points marked in an overhead image [ Red dots ]

How do I calculate where on the overhead image the camera was positioned?

So far I see there are 4 rays (R1, R2, R3, R4) extending from the unknown camera center C = (Cx, Cy, Cz) through the points in the equirectangular image and ending at the pixel coordinates of the overhead image (P1, P2, P3, P4). So 4 vector equations of the form:

[Cx, Cy, Cz] + [Rx, Ry, Rz]*t = [x, y, 0]

for each correspondence. So

C + R1*t1 = P1 = [x1, y1, 0]
C + R2*t2 = P2 = [x2, y2, 0]
C + R3*t3 = P3 = [x3, y3, 0]
C + R4*t4 = P4 = [x4, y4, 0]

So 7 unknowns and 12 equations? This was my attempt but doesn't seem to give a reasonable answer:

import numpy as np

def equi2sphere(x, y):
    width = 2000
    height = 1000
    theta = 2 * np.pi * x  / width - np.pi
    phi = np.pi * y / height
    return theta, phi

HEIGHT = 1000
MAP_HEIGHT = 788
#
# HEIGHT = 0
# MAP_HEIGHT = 0

# Point in equirectangular image, bottom left = (0, 0)
xs = [1190, 1325, 1178, 1333]
ys = [HEIGHT - 730,   HEIGHT - 730,  HEIGHT - 756,  HEIGHT - 760]

# import cv2
# img = cv2.imread('equirectangular.jpg')
# for x, y in zip(xs, ys):
#     img = cv2.circle(img, (x, y), 15, (255, 0, 0), -1)
# cv2.imwrite("debug_equirectangular.png", img)

# Corresponding points in overhead map, bottom left = (0, 0)
px = [269, 382, 269, 383]
py = [778, 778, 736, 737]

# import cv2
# img = cv2.imread('map.png')
# for x, y in zip(px, py):
#     img = cv2.circle(img, (x, y), 15, (255, 0, 0), -1)
# cv2.imwrite("debug_map.png", img)

As = []
bs = []
for i in range(4):

    x, y = xs[i], ys[i]

    theta, phi = equi2sphere(x, y)

    # convert to spherical
    p = 1
    sx = p * np.sin(phi) * np.cos(theta)
    sy = p * np.sin(phi) * np.sin(theta)
    sz = p * np.cos(phi)

    print(x, y, '->', np.degrees(theta), np.degrees(phi), '->', round(sx, 2), round(sy, 2), round(sz, 2))

    block = np.array([
        [1, 0, 0, sx],
        [0, 1, 0, sy],
        [1, 0, 1, sz],
    ])

    y = np.array([px[i], py[i], 0])

    As.append(block)
    bs.append(y)



A = np.vstack(As)
b = np.hstack(bs).T
solution = np.linalg.lstsq(A, b)
Cx, Cy, Cz, t = solution[0]

import cv2
img = cv2.imread('map_overhead.png')

for i in range(4):

    x, y = xs[i], ys[i]

    theta, phi = equi2sphere(x, y)

    # convert to spherical
    p = 1
    sx = p * np.sin(phi) * np.cos(theta)
    sy = p * np.sin(phi) * np.sin(theta)
    sz = p * np.cos(phi)

    pixel_x = Cx + sx * t
    pixel_y = Cy + sy * t
    pixel_z = Cz + sz * t
    print(pixel_x, pixel_y, pixel_z)
    img = cv2.circle(img, (int(pixel_x), img.shape[0] - int(pixel_y)), 15, (255,255, 0), -1)
    img = cv2.circle(img, (int(Cx), img.shape[0] - int(Cy)), 15, (0,255, 0), -1)


cv2.imwrite("solution.png", img)


# print(A.dot(solution[0]))
# print(b)

Resulting camera position (Green) and projected points (Teal)

EDIT: One bug fixed is that the longitude offset in the equirectangular images in PI/4 which fixes the rotation issue but the scale is still off somehow.

investigate solvePnP. do you understand the equirectangular model well enough to calculate the rays in space for those 4 image points? — Christoph Rackwitz, Oct 23 '22 at 19:34
@ChristophRackwitz isn't solvePnp for a pinhole camera. Yes I do understand the model well enough to calculate 4 rays. — nickponline, Oct 23 '22 at 21:50
When visualizing the results, I think it is good to connect the Red and the Teal with a line so that the correspondence between the points can be seen. — fana, Oct 24 '22 at 02:30
yes, solvepnp is for pinhole, but the *ideas* apply, so you should research that API, its *theory*, and adjacent theory. — Christoph Rackwitz, Oct 24 '22 at 09:38
@ChristophRackwitz I'm familiar with solvePNP but what would the intrinsics of the corresponding pinhole camera be? — nickponline, Oct 26 '22 at 15:39
if you wanted a hack, you could make up a virtual camera looking down, with no lens distortion, and calculate the screen space points of those points you defined in your equirectangular projection. then you could give those points and the virtual camera's intrinsics to the solvePnP API. that's an acceptable hack, if your points aren't spanning 180 degrees of view or more. — Christoph Rackwitz, Oct 26 '22 at 17:32
I've given it some purely mathematical thought, ignoring the elevation of the camera to focus on its (x,y) pos: the ratio horizontal distance between points/width on the equirect pic is proportional to the angle of vision of the corresponding side of the rectangle on the overhead view; using these angles you can geometrically construct the center of vision. Then it should be possible to calculate (x,y) from the geometrical construction. — Swifty, Oct 28 '22 at 16:07

LoneWanderer · Answer 1 · 2022-10-28T10:32:13.313

EDIT: using the MAP picture width/length for spherical conversion gives way better results for camera center. Points positions are still a bit messy. Map with a better solution for camera center: , points are somewhat flattened

I took the liberty of rewriting a bit of the code, adding points identification using variables and colors (In your original code, some points were in different order in the various lists). This is preferable if one wants to work with more data points. yeah, I chose a dict for debug purposes, but a list of N points would indeed be preferrable, provided that theyare correctly index paired between the different projections.

I also adapted the coordinates to match the pictures I had. And the x,y variables usage/naming for my understanding.

It is still incorrect, but there is some sort of consistency between each found position.

Possible cause

OpenCV images put the [0,0] in the TOPLEFT corner. The code below is consistent with that convention for points coordinates, but I did not change any math formula.

Maybe there is an error or inconsistencies in some of the formulas. You may want to check again your conventions : signs, [0,0] location etc.

I don't see any input related to camera location and altitude in the formulas, which may be a source of error.

You may have a look to this project that performs equirectangular projections: https://github.com/NitishMutha/equirectangular-toolbox

from typing import Dict

import cv2
import numpy as np


def equi2sphere(x, y, width, height):
    theta = (2 * np.pi * x / width) - np.pi
    phi = (np.pi * y / height)
    return theta, phi


WIDTH = 805
HEIGHT = 374  # using stackoverflow PNG
MAP_WIDTH = 662
MAP_HEIGHT = 1056  # using stackoverflow PNG

BLUE = (255, 0, 0)
GREEN = (0, 255, 0)
RED = (0, 0, 255)
CYAN = (255, 255, 0)
points_colors = [BLUE, GREEN, RED, CYAN]

TOP_LEFT = "TOP_LEFT"
TOP_RIGHT = "TOP_RIGHT"
BOTTOM_LEFT = "BOTTOM_LEFT"
BOTTOM_RIGHT = "BOTTOM_RIGHT"


class Point:
    def __init__(self, x, y, color):
        self.x = x
        self.y = y
        self.c = color

    @property
    def coords(self):
        return (self.x, self.y)


# coords using GIMP which uses upperleft [0,0]
POINTS_ON_SPHERICAL_MAP: Dict[str, Point] = {TOP_LEFT    : Point(480, 263, BLUE),
                                             TOP_RIGHT   : Point(532, 265, GREEN),
                                             BOTTOM_LEFT : Point(473, 274, RED),
                                             BOTTOM_RIGHT: Point(535, 275, CYAN),
                                             }
# xs = [480, 532, 473, 535, ]
# ys = [263, 265, 274, 275, ]

img = cv2.imread('equirectangular.png')
for p in POINTS_ON_SPHERICAL_MAP.values():
    img = cv2.circle(img, p.coords, 5, p.c, -1)
cv2.imwrite("debug_equirectangular.png", img)

# coords using GIMP which uses upperleft [0,0]
# px = [269, 382, 269, 383]
# py = [278, 278, 320, 319]
POINTS_ON_OVERHEAD_MAP: Dict[str, Point] = {TOP_LEFT    : Point(269, 278, BLUE),
                                            TOP_RIGHT   : Point(382, 278, GREEN),
                                            BOTTOM_LEFT : Point(269, 320, RED),
                                            BOTTOM_RIGHT: Point(383, 319, CYAN),
                                            }

img = cv2.imread('map.png')

for p in POINTS_ON_OVERHEAD_MAP.values():
    img = cv2.circle(img, p.coords, 5, p.c, -1)
cv2.imwrite("debug_map.png", img)

As = []
bs = []
for point_location in [TOP_LEFT, TOP_RIGHT, BOTTOM_LEFT, BOTTOM_RIGHT]:
    x_spherical, y_spherical = POINTS_ON_SPHERICAL_MAP[point_location].coords
    theta, phi = equi2sphere(x=x_spherical, y=y_spherical, width=MAP_WIDTH, height=MAP_HEIGHT) # using the overhead map data for conversions
    # convert to spherical
    p = 1
    sx = p * np.sin(phi) * np.cos(theta)
    sy = p * np.sin(phi) * np.sin(theta)
    sz = p * np.cos(phi)
    print(f"{x_spherical}, {y_spherical} -> {np.degrees(theta):+.3f}, {np.degrees(phi):+.3f} -> {sx:+.3f}, {sy:+.3f}, {sz:+.3f}")
    block = np.array([[1, 0, 0, sx],
                      [0, 1, 0, sy],
                      [1, 0, 1, sz], ])
    x_map, y_map = POINTS_ON_OVERHEAD_MAP[point_location].coords
    vector = np.array([x_map, y_map, 0])
    As.append(block)
    bs.append(vector)

A = np.vstack(As)
b = np.hstack(bs).T
solution = np.linalg.lstsq(A, b)
Cx, Cy, Cz, t = solution[0]

img = cv2.imread("debug_map.png")

for point_location in [TOP_LEFT, TOP_RIGHT, BOTTOM_LEFT, BOTTOM_RIGHT]:
    x_spherical, y_spherical = POINTS_ON_SPHERICAL_MAP[point_location].coords
    theta, phi = equi2sphere(x=x_spherical, y=y_spherical, width=MAP_WIDTH, height=MAP_HEIGHT) # using the overhead map data for conversions
    # convert to spherical
    p = 1
    sx = p * np.sin(phi) * np.cos(theta)
    sy = p * np.sin(phi) * np.sin(theta)
    sz = p * np.cos(phi)
    pixel_x = Cx + sx * t
    pixel_y = Cy + sy * t
    pixel_z = Cz + sz * t
    print(f"{pixel_x:+0.0f}, {pixel_y:+0.0f}, {pixel_z:+0.0f}")
    img = cv2.circle(img, (int(pixel_x), int(pixel_y)), 5, POINTS_ON_SPHERICAL_MAP[point_location].c, -1)

img = cv2.circle(img, (int(Cx), int(Cy)), 4, (200, 200, 127), 3)

cv2.imwrite("solution.png", img)

Map with my initial solution: Debug map: Equirectangular image: Debug equirectangular:

Swifty · Answer 2 · 2022-10-29T22:30:03.090

To expand on my comment, here's the method I use to first calculate Cx and Cy. Cz will be determined afterwards using Cx and Cy.

On this overhead view, the circle is the cylinder that unrolls into the equirectangular image; A' , B' , C' and D' are the points that represent A, B, C, D on this image; the horizontal distances between A' and B', ... are proportional to the angles A-Camera-B, ... . Hence A'B'/ circle-perimeter = A-Camera-B / 2pi and thus A-Camera-B = A'B'/ circle-perimeter * 2pi (the circle's perimeter being the width of the equirectangular image). Let's call this angle alpha.

This figure illustrates how we can determine the possible positions of the camera from the angle alpha, using the properties of angles in circles : the 3 marked angles are equal to alpha, thus tan(alpha) = AH/O1H, hence O1H = AH / tan(alpha) . We now have the coordinates of O1 (AB/2 , AB/(2 tan(alpha)) . (in a cartesian coordinate system with A as origin).

By doing the same for segment [AD], we get a 2nd circle of possible positions for the camera. The intersection points of the 2 circles are A and the actual camera position.

Of course the precision of the determined position is dependent on the precision of the coordinates of A', B'... on the equirectangular picture; here A' and D' are (horizontally) only 6-7 pixels apart, so there's some fluctuation.

Now to calculate Cz : on this side view, the half-circle unfolds into the pixel column containing A' in the equirectangular image ; similar to the calculation of alpha earlier, the ratio of A'I / length of the half-circle (which is the height of the image) is equal to tilt angle / pi, so tilt = A'I / height * pi ; on the equirectangular image, A'I is the vertical pixel coordinate of A'. Basic trigonometry yields : tan(tilt) = -AH/OH, so Cz = OH = -AH/tan(tilt). AH is calculated from the coordinates of H computed before.

                 ---------------------------------------------------

Here's the Python code for the calculations; for the intersections of the circles, I've used the code from this post ; note that since we know that A is one of the intersections, the code could be simplified (CamPos is actually the symmetrical reflection of A in relation to (O1 O2)).

The results are (Cx, Cy) relative to A, in pixels, then Cz, also in pixels. Note that the calculations only make sense if the overhead picture's dimensions are proportional to the real dimensions (since calculating distances only make sense in an orthonormal coordinate system).

import math

# Equirectangular info
A_eq = (472,274)
B_eq = (542,274)
C_eq = (535,260)
D_eq = (479,260)

width = 805
height = 374

# Overhead info

A = (267,321)
B = (377,321)
C = (377,274)
D = (267,274)

Rect_width = C[0] - A[0]
Rect_height = A[1] - C[1]


# Angle of view of edge [AB]

alpha = (B_eq[0] - A_eq[0]) / width * 2 * math.pi

# Center and squared radius of the circle of camera positions related to edge [AB]

x0 = Rect_width / 2
y0 = Rect_width / (2* math.tan(alpha))
r02 = x0**2 + y0**2


# Angle of view of edge [AD]

beta = (D_eq[0] - A_eq[0]) / width * 2 * math.pi

# Center and squared radius of the circle of camera positions related to edge [AD]

x1 = Rect_height / (2* math.tan(beta))
y1 = -Rect_height / 2
r12 = x1**2 + y1**2


def get_intersections(x0, y0, r02, x1, y1, r12):
    # circle 1: (x0, y0), sq_radius r02
    # circle 2: (x1, y1), sq_radius r12

    d=math.sqrt((x1-x0)**2 + (y1-y0)**2)
    
    a=(r02-r12+d**2)/(2*d)
    h=math.sqrt(r02-a**2)
    x2=x0+a*(x1-x0)/d   
    y2=y0+a*(y1-y0)/d   
    x3=x2+h*(y1-y0)/d     
    y3=y2-h*(x1-x0)/d 

    x4=x2-h*(y1-y0)/d
    y4=y2+h*(x1-x0)/d
        
    return (round(x3,2), round(y3,2), round(x4,2), round(y4,2))


# The intersection of these 2 circles are A and Camera_Base_Position (noted H)
inters = get_intersections(x0, y0, r02, x1, y1, r12)
H = (Cx, Cy) = (inters[2], inters[3])

print(H)

def get_elevation(camera_base, overhead_point, equirect_point):
    tilt = (equirect_point[1])/height * math.pi
    x , y =  overhead_point[0] - A[0] , overhead_point[1] - A[1]
    base_distance = math.sqrt((camera_base[0] - x)**2 + (camera_base[1] - y)**2 )
    Cz = -base_distance / math.tan(tilt)
    return Cz

print(get_elevation(H, A, A_eq))
print(get_elevation(H, B, B_eq))
print(get_elevation(H, C, C_eq))
print(get_elevation(H, D, D_eq))

# (59.66, 196.19)   # These are (Cx, Cy) relative to point A

# 185.36640516274633  # These are the values of the elevation Cz
# 183.09278981601847  # when using A and A', B and B' ...
# 176.32257112738986
# 177.7819910650333

How do you calculate spherical camera position from point correspondence?

2 Answers2