Transforming captured co-ordinates into screen co-ordinates

Question

I think this is probably a simple maths question but I have no idea what's going on right now.

I'm capturing the positions of "markers" on a webcam and I have a list of markers and their co-ordinates. Four of the markers are the outer corners of a work surface, and the fifth (green) marker is a widget. Like this:

Here's some example data:

Top left marker (a=98, b=86)
Top right marker (c=119, d=416)
Bottom left marker (e=583, f=80)
Bottom right marker (g=569, h=409)
Widget marker (x=452, y=318)

I'd like to somehow transform the webcam's widget position into a co-ordinate to display on the screen, where top left is 0,0 not 98,86 and somehow take into account the warped angles from the webcam capture.

Where would I even begin? Any help appreciated

What language are you working with? – Jacob Nov 06 '09 at 20:48 — Jacob, Nov 06 '09 at 20:48

score 20 · Answer 1 · edited Sep 24 '22 at 18:12

In order to compute the warping, you need to compute a homography between the four corners of your input rectangle and the screen.

Since your webcam polygon seems to have an arbitrary shape, a full perspective homography can be used to convert it to a rectangle. It's not that complicated, and you can solve it with a mathematical function (should be easily available) known as Singular Value Decomposition or SVD.

Background information:

For planar transformations like this, you can easily describe them with a homography, which is a 3x3 matrix H such that if any point on or in your webcam polygon, say x1 were multiplied by H, i.e. H*x1, we would get a point on the screen (rectangular), i.e. x2.

Now, note that these points are represented by their homogeneous coordinates which is nothing but adding a third coordinate (the reason for which is beyond the scope of this post). So, suppose your coordinates for X1 were, (100,100), then the homogeneous representation would be a column vector x1 = [100;100;1] (where ; represents a new row).

Ok, so now we have 8 homogeneous vectors representing 4 points on the webcam polygon and the 4 corners of your screen - this is all we need to compute a homography.

Computing the homography:

A little math: I'm not going to get into the math, but briefly this is how we solve it:

We know that 3x3 matrix H,

H = 

h11 h12 h13
h21 h22 h23
h31 h32 h33

where hij represents the element in H at the ith row and the jth column

can be used to get the new screen coordinates by x2 = H*x1. Also, the result will be something like x2 = [12;23;0.1] so to get it in the screen coordinates, we normalize it by the third element or X2 = (120,230) which is (12/0.1,23/0.1).

So this means each point in your webcam polygon (WP) can be multiplied by H (and then normalized) to get your screen coordinates (SC), i.e.

SC1 = H*WP1
SC2 = H*WP2
SC3 = H*WP3
SC4 = H*WP4
where SCi refers to the ith point in screen coordinates and 
      WPi means the same for the webcam polygon

Computing H: (the quick and painless explanation)

Pseudocode:

for n = 1 to 4
{
    // WP_n refers to the 4th point in the webcam polygon 
    X = WP_n;

    // SC_n refers to the nth point in the screen coordinates
    // corresponding to the nth point in the webcam polygon

    // For example, WP_1 and SC_1 is the top-left point for the webcam
    // polygon and the screen coordinates respectively.

    x = SC_n(1); y = SC_n(2);

    // A is the matrix which we'll solve to get H
    // A(i,:) is the ith row of A

    // Here we're stacking 2 rows per point correspondence on A
    // X(i) is the ith element of the vector X (the webcam polygon coordinates, e.g. (120,230)
    A(2*n-1,:) = [0 0 0 -X(1) -X(2) -1 y*X(1) y*X(2) y];    
    A(2*n,:)   = [X(1) X(2) 1 0 0 0 -x*X(1) -x*X(2) -x];
}

Once you have A, just compute svd(A) which will give decompose it into U,S,V^T (such that A = USV^T). The vector corresponding to the smallest singular value is H (once you reshape it into a 3x3 matrix).

With H, you can retrieve the "warped" coordinates of your widget marker location by multiplying it with H and normalizing.

Example:

In your particular example if we assume that your screen size is 800x600,

WP =

    98   119   583   569
    86   416    80   409
     1     1     1     1

SC =

     0   799     0   799
     0     0   599   599
     1     1     1     1

where each column corresponds to corresponding points.

Then we get:

H = 
   -0.0155   -1.2525  109.2306
   -0.6854    0.0436   63.4222
    0.0000    0.0001   -0.5692

Again, I'm not going into the math, but if we normalize H by h33, i.e. divide each element in H by -0.5692 in the example above,

H =
    0.0272    2.2004 -191.9061
    1.2042   -0.0766 -111.4258
   -0.0000   -0.0002    1.0000

This gives us a lot of insight into the transformation.

[-191.9061;-111.4258] defines the translation of your points (in pixels)
[0.0272 2.2004;1.2042 -0.0766] defines the affine transformation (which is essentially scaling and rotation).
The last 1.0000 is so because we scaled H by it and
[-0.0000 -0.0002] denotes the projective transformation of your webcam polygon.

Also, you can check if H is accurate my multiplying SC = H*WP and normalizing each column with its last element:

SC = H*WP    

    0.0000 -413.6395         0 -411.8448
   -0.0000    0.0000 -332.7016 -308.7547
   -0.5580   -0.5177   -0.5554   -0.5155

Dividing each column, by it's last element (e.g. in column 2, -413.6395/-0.5177 and 0/-0.5177):

SC
   -0.0000  799.0000         0  799.0000
    0.0000   -0.0000  599.0000  599.0000
    1.0000    1.0000    1.0000    1.0000

Which is the desired result.

Widget Coordinates:

Now, your widget coordinates can be transformed as well H*[452;318;1], which (after normalizing is (561.4161,440.9433).

So, this is what it would look like after warping: Warped

As you can see, the green + represents the widget point after warping.

Notes:

There are some nice pictures in this article explaining homographies.
You can play with transformation matrices here

MATLAB Code:

WP =[
    98   119   583   569
    86   416    80   409
     1     1     1     1
     ];

SC =[
     0   799     0   799
     0     0   599   599
     1     1     1     1
     ];    

A = zeros(8,9);  

for i = 1 : 4     
    X = WP(:,i);    
    x = SC(1,i); y = SC(2,i);        
    A(2*i-1,:) = [0 0 0 -X(1) -X(2) -1 y*X(1) y*X(2) y];        
    A(2*i,:)   = [X(1) X(2) 1 0 0 0 -x*X(1) -x*X(2) -x];
end

[U S V] = svd(A);

H = transpose(reshape(V(:,end),[3 3]));
H = H/H(3,3);

A

       0           0           0         -98         -86          -1           0           0           0
      98          86           1           0           0           0           0           0           0
       0           0           0        -119        -416          -1           0           0           0
     119         416           1           0           0           0      -95081     -332384        -799
       0           0           0        -583         -80          -1      349217       47920         599
     583          80           1           0           0           0           0           0           0
       0           0           0        -569        -409          -1      340831      244991         599
     569         409           1           0           0           0     -454631     -326791        -799

Thanks for the very detailed answer. I've read it about 10 times now and it's starting to make some sense! Googling for code doesn't bring up much, but I have managed to find PVectors in Processing (http://processing.org/reference/PVector.html) - I assume this is what I need? I'll keep on plugging away and see how far I get! — Al., Nov 06 '09 at 21:25
I'm glad you didn't give up - this stuff is quite simple once you understand it. I'll keep posting updates when I think of something but please feel free to ask questions. — Jacob, Nov 06 '09 at 21:47
And yes, PVectors will help with that, but you'll need something to represent matrices as well. Check out this SO post on doing SVD with PHP, http://stackoverflow.com/questions/960060/singular-value-decomposition-svd-in-php — Jacob, Nov 06 '09 at 22:54
All this may seem very complicated compared to other methods, but it's useful since you can transform several points in the polygon once you find `H` with just matrix multiplication, and if you have access to a good linear algebra library, it's just a few operations. Else, I'd recommend you go for other methods like the one with the triangles. — Jacob, Nov 07 '09 at 02:20
@Al this seems like a very nice answer and you seem to be happy with it, so why not mark it as accepted? — Ivan, Nov 09 '09 at 11:20
Thank you very much for this wonderful answer. I have been trying to implement it in my own project and have stumbled upon a few snags, mostly due to my mathematical shortcomings. I was hoping you might shed some light on some of it. 1) I have used AlgLib's RMatrixSVD function to obtain H, but in none of the results have I found your test values; 2) Is the order of the WP values in the WP matrix not mixed up? — Domus, Nov 28 '09 at 22:43
Before normalizing I verified whether any of the values in the resulting matrices after the SVD operation corresponded to the first H matrix in your answer, and none appeared, including -0.5692, which was supposed to be h33. I used the same values (coordinates) as in the answer. I suppose I'm overlooking something tiny, but it's been haunting me for days now. — Domus, Nov 29 '09 at 12:10
Jacob, could you explain a little more about what your doing to compute matrix A. When feeding A into a general SVD implementation I'm not seeing the same results as Domus has also stated. I have a feeling that it may be due to differences in what type of matrix the SVD implementation is expecting, from the pseudo-code it looks like A will be a 8x9 matrix, is that right? If you could give a brief description of the actual purpose of the operations being done in your pseudo-code to compute A it would help greatly. — jpierson, Jul 20 '10 at 16:52
@Domus: The results of `H` varies between implementation **before** normalization. You should only compare my `H` and yours after dividing by `h33`. — Jacob, Jul 20 '10 at 18:20
@jpierson: I've included the MATLAB code as well as the `A` matrix. — Jacob, Jul 20 '10 at 18:20
Thank you Jacob for your quick response. I plugged your A matrix into this Online Matrix Calculator to calculate the SVD and the output for U is still a 8x9 matrix instead of the 3x3 matrix you originally posted. I feel like I'm missing something elementary, should I be looking for a reduced form of this matrix? http://www.bluebit.gr/matrix-calculator — jpierson, Jul 20 '10 at 18:55
@jpierson: I apologize, looks like I forgot to mention how to get `H` from `V`. I've updated my answer (it's under *Computing H*). But the `V^T` from the SVD on that website is not quite accurate. Also, remember, it's the transpose of `V` on that website, whereas you need to last column of `V` (and not the transpose). — Jacob, Jul 20 '10 at 19:41
@jpierson: If you use the MATLAB code I've included with octave (freely available at http://www.gnu.org/software/octave/) you'll see that the results are correct (since you have a more accurate decomposition). — Jacob, Jul 20 '10 at 19:44
@jpierson: **WAIT!** You don't need octave: if you set the output precision to 15 decimal places, you get an accurate `V` (doh!). So all you have to do to get the right answer is: 1) Copy the `V` 2) Transpose it 3)Take the last vector and reshape it into a 3x3 matrix (row-major) 4) Now you have `H`! 5) Normalize with `H(3,3)` — Jacob, Jul 20 '10 at 19:50
@jpierson: Or equivalently, take the first row of `V`, reshape it into a 3x3 matrix and normalize. — Jacob, Jul 20 '10 at 19:53
@Jacob: Excellent based off of your original description of V^T and the fact that we needed the last column we have been able to get our code mostly working. We are using C# and ALGLIB currently. — jpierson, Jul 21 '10 at 01:27
@Jacob: Thank you very much for your answer. It's been a while and I finally managed, but can't remember how! In any case, the result was phenomenal. I was able to draw directly on my screen using a small flashlight. :) — Domus, Jul 26 '10 at 16:10
@Domus: I'm guessing you tracked a point (the flash light) but how did you figure out the orientation? — Jacob, Jul 26 '10 at 17:00
@Jacob: I just aimed the light at the screen (closely enough). The webcam was aimed at the screen and I just had to map the illuminated area, thanks to your algorithm, to the real screen coordinates, and then draw a dot at these coordinates. So the cam was aimed at the screen, not the flashlight. My next step, if ever I find the time, is to have it detect my finger. And then on to some serious interaction. The applications are countless, and it annoys me that other work is in the way of proceeding with it. — Domus, Aug 01 '10 at 16:14
Another amazing answer by you! Again, great that you link tools for testing ad references. :) — Tormod Haugene, Sep 05 '13 at 09:01

score 2 · Answer 2 · answered Nov 06 '09 at 21:02

2

Due to perspective effects linear or even bilinear transformations may not be accurate enough. Look at correct perspective mapping and more from google on this phrase, may be this is what you need...

answered Nov 06 '09 at 21:02

maxim1000

6,297
1
23
19

2

Although Jacob's answer is really lovely, articulate, and has a lot of work in put into it, he seems to miss this important point. Linear systems only compute affine or linear transforms and this transform is clearly not linear. No matter how beautiful the code, programs that compute the correct numbers are better. My uptick to you, maxim. – Die in Sente Nov 10 '09 at 21:21
1

maxim & Die in Sente: When used with homogeneous coordinates, linear transforms give the correct answer. Where your link on correct perspective mapping says "dividing by depth z", Jacob says "we normalise by the third element". – Chris Johnson Jul 20 '10 at 19:48

score 0 · Answer 3 · edited May 23 '17 at 12:17

0

More of how to do this in objective-c in xcode, related to jacobs post, you can find here: calculate the V from A = USVt in objective-C with SVD from LAPACK in xcode

edited May 23 '17 at 12:17

Community

1
1

answered Aug 10 '12 at 10:03

fellowworldcitizen

3,441
3
15
17

score 0 · Answer 4 · answered Sep 16 '12 at 03:14

0

The "Kabcsh Algorithm" does exactly this: it creates a rotation matrix between two spaces given N matched pairs of positions.

http://en.wikipedia.org/wiki/Kabsch_algorithm

answered Sep 16 '12 at 03:14

user1675087

31
2

Ben S · Answer 5 · 2009-11-06T19:54:02.540

Since your input area isn't a rectangle of the same aspect-ratio as the screen, you'll have to apply some sort of transformation to do the mapping.

What I would do is take the proportions of where the inner point is with respect to the outer sides and map that to the same proportions of the screen.

To do this, calculate the amount of the free space above, below, to the left, and to the right of the inner point and use the ratio to find out where in the screen the point should be.

alt text http://img230.imageshack.us/img230/5301/mapkg.png

Once you have the measurements, place the inner point at:

x = left / (left + right)
y = above / (above + below)

This way, no matter how skewed the webcam frame is, you can still map to the full regular rectangle on the screen.

This will produce some strange results. For example, all points on the left edge of the polygon will map to the top-left corner of the screen. — Tom Sirgedas, Jul 20 '10 at 20:09

culebrón · Answer 6 · 2009-11-06T20:50:37.530

Try the following: split the original rectangle and this figure with 2 diagonals. Their crossing is (k, l). You have 4 distorted triangles (ab-cd-kl, cd-ef-kl, ef-gh-kl, gh-ab-kl) and the point xy is in one of them.

(4 triangles are better than 2, since the distortion doesn't depend on the diagonal chosen)

You need to find in which triangle point XY is. To do that you need only 2 checks:

Check if it's in ab-cd-ef. If true, go on with ab-cd-ef, (in your case it's not, so we proceed with cd-ef-gh).
We don't check cd-ef-gh, but already check a half of it: cd-gh-kl. The point is there. (Otherwise it would have been ef-gh-kl)

Here's an excellent algorythm to check if a point is in a polygon, using only it's points.

Now you need only to map the point to the original triangle cd-gh-kl. The point xy is a linear combination of the 3 points:

x = c * a1 + g * a2 + k * (1 - a1 - a2)
y = d * a1 + h * a2 + l * (1 - a1 - a2)
a1 + a2 <= 1

2 variables (a1, a2) with 2 equations. I guess you can derive the solution formulae on your own.

Then you just make a linear combinations of a1&a2 with the corresponding points' co-ordinates in the original rectangle. In this case with W (width) and H (height) it's

X = width * a1 + width * a2 + width / 2 * (1 - a1 - a2)
Y = 0 * a1 + height * a2 + height / 2 * (1 - a1 - a2)

Transforming captured co-ordinates into screen co-ordinates

6 Answers6

MATLAB Code:

A

Linked