Transforming screen coordinates from security camera to real world coordinates

Question

I have the frames from a surveillance camera observing a big public hall from an angle close to 45° relative to the ceiling. Every person is annotated for each frame they appear in with screen coordinates (1920 X 1080 px). Which means I have (X,Y) coordinates for each person for each frame.

Now what I need to do is transform these coordinates into real world coordinates, since naturally there is a distortion between sections close to the camera and sections further away from it. In the end I hope to have transformed coordinates which depict constant movement in a linear fashion, no matter where it occurs, if that makes sense. (I measure the velocity of each person)

I don´t know the height of the camera, nor the exact angle, but it is sufficient to estimate both.

Thanks for any help.

EDIT: Basically this is a mathematical question. I don´t have any code that deals with this part of the problem. The sole purpose is to improve the accuracy of my program by having more accurate coordinates than the current which are distorted.

Possible duplicate of [Android OpenGL ES 2.0 screen coordinates to world coordinates](http://stackoverflow.com/questions/10985487/android-opengl-es-2-0-screen-coordinates-to-world-coordinates). This post seems related, given that it's about transforming between coordinate systems, etc. — code_dredd, Aug 14 '16 at 11:55
Well the problem is that I need to do it myself rather than using c++-functions. I expected there to be a simple matrix that does the transformation I want, but I can´t really figure it out. So this is more of a mathematical question i suppose. — Sciaphilium, Aug 14 '16 at 12:11
So you need to find the distance a given person is from your camera. The basic idea would have to be based on the simple case of having the camera looking straight at an object, given its height for example, using similar triangles, if L is the length from camera to object, h1 is the actual height of object, h2 is the pixel height of object and F is the focal length of camera, you have: L = h1*F/h2. — Bobas_Pett, Aug 14 '16 at 14:51

Transforming screen coordinates from security camera to real world coordinates

0 Answers0

Linked