How to fit an affine transformation which consists of scaling and translation only?

Question

I'm trying to write an app that allows users to georeference an image of a map by clicking waypoints on the map, thereby determining pixel coordinates x and y, and subsequently clicking on the same waypoint on a 'real' map resulting in a longitude lng and latitude lng.

I'd like to take it as given that both the image and 'real' map are oriented north-south, so that there is no rotation between them, and also that there is a single scale for north-south and east-west. That is:

lng(x) = scale * x + a
lat(y) = -scale * y + b

(The reason for the minus sign is that the y pixel coordinate increases from the top of the image to the bottom, whereas latitude lat increases from south to north).

I've adapted the answer to how to perform coordinates affine transformation using python? part 2 as follows:

import numpy as np

coords = [
    {"pixel": {"x": 610, "y": 1673}, "lnglat": {
        "lng": -119.66622566136141, "lat": 37.71690293889708}},
    {"pixel": {"x": 3616, "y": 948}, "lnglat": {
        "lng": -119.55987333997541, "lat": 37.739791632115}},
    {"pixel": {"x": 156, "y": 1582}, "lnglat": {
        "lng": -119.68242540789811, "lat": 37.719168689576634}},
    {"pixel": {"x": 1432, "y": 1079}, "lnglat": {
        "lng": -119.63773163590452, "lat": 37.733899511112554}},
    {"pixel": {"x": 1467, "y": 982}, "lnglat": {
        "lng": -119.6365899951677, "lat": 37.73740878429034}},
    {"pixel": {"x": 2045, "y": 464}, "lnglat": {
        "lng": -119.61643210247348, "lat": 37.75263501532096}},
    {"pixel": {"x": 2530, "y": 225}, "lnglat": {
        "lng": -119.59904847563081, "lat": 37.759640099263024}},
    {"pixel": {"x": 3611, "y": 217}, "lnglat": {
        "lng": -119.57440674003465, "lat": 37.769372182124215}},
    {"pixel": {"x": 4218, "y": 289}, "lnglat": {
        "lng": -119.53927620600871, "lat": 37.7590418448261}},
    {"pixel": {"x": 4972, "y": 819}, "lnglat": {
        "lng": -119.51283799895947, "lat": 37.7451015130886}},
    {"pixel": {"x": 4869, "y": 1178}, "lnglat": {
        "lng": -119.5150031101931, "lat": 37.73452849532761}},
    {"pixel": {"x": 4858, "y": 1268}, "lnglat": {
        "lng": -119.51537412573026, "lat": 37.731943969799104}},
    {"pixel": {"x": 4637, "y": 1307}, "lnglat": {
        "lng": -119.52293169964986, "lat": 37.730726899819345}},
    {"pixel": {"x": 4284, "y": 1599}, "lnglat": {
        "lng": -119.53520554208092, "lat": 37.72240153076238}},
    {"pixel": {"x": 4150, "y": 1676}, "lnglat": {
        "lng": -119.53996905111126, "lat": 37.71984653680312}},
    {"pixel": {"x": 3432, "y": 1989}, "lnglat": {
        "lng": -119.56520552108367, "lat": 37.70994983543632}},
    {"pixel": {"x": 2965, "y": 1408}, "lnglat": {
        "lng": -119.58234774459186, "lat": 37.72663636959598}},
    {"pixel": {"x": 2560, "y": 1921}, "lnglat": {
        "lng": -119.59584076119313, "lat": 37.712008849961066}},
    {"pixel": {"x": 1840, "y": 1593}, "lnglat": {
        "lng": -119.6231396666414, "lat": 37.72018991118786}},
    {"pixel": {"x": 1140, "y": 1590}, "lnglat": {
        "lng": -119.64782744839357, "lat": 37.71938854312988}},
]

pixel_coordinates = np.array([[coord['pixel']['x'], coord['pixel']['y']] for coord in coords])
lnglat_coordinates = np.array([[coord['lnglat']['lng'], coord['lnglat']['lat']] for coord in coords])


# Pad the data with ones, so that our transformation can do translations too
n = pixel_coordinates.shape[0]


def pad(x):
    return np.hstack([x, np.ones((x.shape[0], 1))])


def unpad(x):
    return x[:, :-1]


X = pad(pixel_coordinates)
Y = pad(lnglat_coordinates)

# Solve the least squares problem X * A = Y
# to find our transformation matrix A
A, res, rank, s = np.linalg.lstsq(X, Y, rcond=None)


def transform(x):
    return unpad(np.dot(pad(x), A))


print("Target:")
print(lnglat_coordinates)
print("Result:")
print(transform(pixel_coordinates))
print("Max error:", np.abs(lnglat_coordinates - transform(pixel_coordinates)).max())

print(A)

The resulting matrix A looks like this:

[[ 3.55857577e-05  8.98377941e-07  2.81630741e-18]
 [ 3.43101520e-06 -2.97714115e-05 -8.56519716e-18]
 [-1.19693144e+02  3.77657997e+01  1.00000000e+00]]

I would expect it to have the form

[[scale 0      0]
 [0     -scale 0]
 [a     b      1]]

What I notice is that the independently derived values of scale actually differ by ~16% and that the off-diagonal columns which are supposed to be zero aren't.

(In practice, I also notice that the overlay computed with this algorithm is significantly off, as shown below with partial opacity. Note how the roads appear to be shifted to the northwest).

Is there any way to impose these constraints on the least-squares estimation of the affine transformation - that is, to ensure that the transformation consists of scaling and translation only (no skewing or rotation)?

score 2 · Answer 1 · answered Aug 28 '19 at 17:32

I solved this by following the approach described in these UC Davis lecture notes; I've also described the approach in a related question on Math StackExchange.

Suppose you have two waypoints, then the relation between latitude/longitude and pixel coordinates can be expressed as

(I also realized that the scaling factors for longitude and latitude are different unless you're at the equator). This is of the form ax = b described in numpy.linalg.lstsq and we can hence estimate x straightforwardly in the least-squares sense.

Here is the code that I used to implement this:

from dataclasses import dataclass
from typing import List, Dict
import numpy as np
from cached_property import cached_property


@dataclass
class MapFitter:
    coords: List[Dict[str, object]]

    @cached_property
    def coeffs(self):
        lnglat = np.vstack(tuple(
            np.array([[coord['lnglat']['lng']], [coord['lnglat']['lat']]]) for coord in coords))

        xy = np.vstack(tuple(
            np.array([[coord['pixel']['x'], 0, 1, 0], [0, coord['pixel']['y'], 0, 1]]) for coord in coords))

        coefficients, residuals, rank, singular_values = np.linalg.lstsq(
            a=xy, b=lnglat, rcond=None)

        return coefficients

    def transform(self, xy_coord):
        x, y = xy_coord
        scale_lng, scale_lat, offset_lng, offset_lat = self.coeffs
        return [scale_lng * x + offset_lng, scale_lat * y + offset_lat]

So for example, with the same coords data structure given in the question, I can do

map_fitter = MapFitter(coords=coords)

print(map_fitter.transform([0, 0]))
print(map_fitter.transform([5038, 2027]))

resulting in the corner coordinates of the map:

[array([-119.68835968]), array([37.7690185])]
[array([-119.51026687]), array([37.70773367])]

I also found this to give moderately better alignment of the overlay with the map.

How to fit an affine transformation which consists of scaling and translation only?

1 Answers1