A popular question but for which I can't find a definitive answer. Math wizards we need you!
The Problem is the following
We have a unmodified photograph (not cropped) of a rectangular object taken from some angle. We don't know the real size of the rectangle or any info about that picture.
GOAL : We want to find the aspect ratio / proportions of the rectangle with only this image as input.
This question isn't linked to a particular programming language (pseudo code allowed) but I want to implement the solution in Python using NumPy.
In this photograph it's an A4 paper sheet so the code/algorithm should find a ratio of 1.4142 : 1
My Questions
- What is the best solution (See Info section below), handling edges cases ? Is there another one I not listed ?
- Is there an implementation (C++, Python, ...) in a maintained library package (OpenCV, SciPy, ...) ?
Info
After digging the internet it's seems that this problem isn't easy to solve, there is some info :
- Research Paper by Zhengyou Zhang , Li-Wei He, "Whiteboard scanning and image enhancement" http://research.microsoft.com/en-us/um/people/zhang/papers/tr03-39.pdf / https://www.microsoft.com/en-us/research/publication/2016/11/Digital-Signal-Processing.pdf
- Research Paper by Shen Cai,* Longxiang Huang, and Yuncai Liu, "Automatically obtaining the correspondences of four coplanar points for an uncalibrated camera" https://sci-hub.hkvisa.net/10.1364/AO.51.005369
- A blog post named "Aspect Ratio of a Rectangle in Perspective" by Andrew Kay https://andrewkay.name/blog/post/aspect-ratio-of-a-rectangle-in-perspective/
Others StackOverflow related questions
There is some questions on StackOverflow with the same topic/subject/goal:
- proportions of a perspective-deformed rectangle
- Calculating aspect ratio of Perspective Transform destination image
- Perspective correction of UIImage from Points
- Get image(rect on a plane) with perspective correction from camera (2d image)
BUT the answers seems to differ in the implementation(even though they claim to be based on the same research paper) and be incomplet (not always handling parallel opposite sides).
My current working implementations
Based on all the info above I implemented the two methods in Python with OpenCV & NumPy:
Method #1
def compute_aspect_ratio(image, corners):
# Based on :
# - https://andrewkay.name/blog/post/aspect-ratio-of-a-rectangle-in-perspective/
# Step 1: Get image center, will be used as origin
h, w = image.shape[:2]
origin = (w * .5, h * .5)
# Step 2: Points coords from image origin
# /!\ CAREFUL : points need to be in zig-zag order (A, B, D, C)
a = corners[0] - origin
b = corners[1] - origin
c = corners[3] - origin
d = corners[2] - origin
# Step 3: Check if the camera lie into the plane of the rectangle
# Coplanar if three points are collinear, in that case the aspect ratio cannot be computed
M = numpy.array([[b[0], c[0], d[0]], [b[1], c[1], d[1]], [1., 1., 1.]])
det = numpy.linalg.det(M)
if math.isclose(det, 0., abs_tol=.001):
# Cannot compute the aspect ratio, the caller need to check if the return value is 0.
return 0.
# Step 4: Create the matrixes
A = numpy.array([[1., 0., -b[0], 0., 0., 0.],
[0., 1., -b[1], 0., 0., 0.],
[0., 0., 0., 1., 0., -c[0]],
[0., 0., 0., 0., 1., -c[1]],
[1., 0., -d[0], 1., 0., -d[0]],
[0., 1., -d[1], 0., 1., -d[1]]], dtype=float)
B = numpy.array([[b[0]-a[0]],
[b[1]-a[1]],
[c[0]-a[0]],
[c[1]-a[1]],
[d[0]-a[0]],
[d[1]-a[1]]], dtype=float)
# Step 5: Solve it, this will give us [Ux, Uy, (Uz / λ), Vx, Vy, (Vz / λ)]
s = numpy.linalg.solve(A, B)
# Step 6: Compute λ, it's the focal length
l = 0.
l_sq = ((-(s[0] * s[3]) - (s[1] * s[4])) / (s[2] * s[5]))
if l_sq > 0.:
l = numpy.sqrt(l_sq)
# If l_sq <= 0, λ cannot be computed, two sides of the rectangle's image are parallel
# Either Uz and/or Vz is equal zero, so we leave l = 0
# Step 7: Get U & V
u = numpy.linalg.norm([s[0], s[1], (s[2] * l)])
v = numpy.linalg.norm([s[3], s[4], (s[5] * l)])
return (v / u)
Method #2
def compute_aspect_ratio(image, corners):
# Based on :
# - https://www.microsoft.com/en-us/research/publication/2016/11/Digital-Signal-Processing.pdf
# - http://research.microsoft.com/en-us/um/people/zhang/papers/tr03-39.pdf
# Step 1: Compute image center (a.k.a origin)
h, w = image.shape[:2]
origin = (w * .5, h * .5)
# Step 2: Make homeneous coordinates
# /!\ CAREFUL : points need to be in zig-zag order (A, B, D, C)
p1 = numpy.array([*corners[0], 1.])
p2 = numpy.array([*corners[1], 1.])
p3 = numpy.array([*corners[3], 1.])
p4 = numpy.array([*corners[2], 1.])
k2 = numpy.dot(numpy.cross(p1, p4), p3) / numpy.dot(numpy.cross(p2, p4), p3)
k3 = numpy.dot(numpy.cross(p1, p4), p2) / numpy.dot(numpy.cross(p3, p4), p2)
# Step 3: Computing U & V vectors, but at this point the z value of these vectors are in the form z / f
# Where f is the focal length
u = (k2 * p2) - p1
v = (k3 * p3) - p1
# Step 4: Unpack vectors to avoid using accessors
uX, uY, uZ = u
vX, vY, vZ = v
# Step 5: Check if two opposite sides are parallel (or almost parallel)
if math.isclose(uZ, .0, abs_tol=.01) or math.isclose(vZ, .0, abs_tol=.01):
aspect_ratio = numpy.sqrt((vX ** 2 + vY ** 2) / (uX ** 2 + uY ** 2))
return aspect_ratio
# Step 6: Compute the focal length
f = numpy.sqrt(numpy.abs((1. / (uZ * vZ)) * ((uX * vX - (uX * vZ + uZ * vX) * origin[0] + uZ * vZ * origin[0] * origin[0]) + (uY * vY - (uY * vZ + uZ * vY) * origin[1] + uZ * vZ * origin[1] * origin[1]))))
A = numpy.array([[f, 0., origin[0]], [0., f, origin[1]], [0., 0., 1.]]).astype('float32')
Ati = numpy.linalg.inv(numpy.transpose(A))
Ai = numpy.linalg.inv(A)
# Step 7: Calculate the real aspect ratio
aspect_ratio = numpy.sqrt(numpy.dot(numpy.dot(numpy.dot(v, Ati), Ai), v) / numpy.dot(numpy.dot(numpy.dot(u, Ati), Ai), u))
return aspect_ratio
If you have any info / help don't hesite to share!