Convert Eye Gaze (Pitch and yaw) into screen coordinates (Where the person is looking at?)

Question

I am asking this questions as a trimmed version of my previous question. Now that I have a face looking some position on screen and also gaze coordinates (pitch and yaw) of both the eye. Let us say

Left_Eye = [-0.06222888 -0.06577308]

Right_Eye = [-0.04176027 -0.44416167]

I want to identify the screen coordinates where the person probably may be looking at? Is this possible? Please help!

Do you calibrate against fixed positions on the screen? This has the potential to be a very hard problem, and is probably too broad unless you can spell out exactly which pieces of information you have available to calculate the coordinates. — Marius, Oct 23 '18 at 02:48
@Marius Could you please elaborate on calibration part? I am assuming that you are talking about gaze calibration on specific screen. If yes, no i didn't do that. Please find the below info that i've, 1. Camera and Screen are always on the same plane, 2. Person to Camera distance would vary 3. Camera is on top of screen. — pavan subhash, Oct 23 '18 at 02:59
you need position and and least 2 vectors (up and forward) to describe your camera view **just position is not enough** ... for stereo vision its 2x position and 2 vectors. Camera is usually described by 4x4 transform matrix (1x position 3 vectors and projection) — Spektre, Oct 23 '18 at 03:24
@Spektre Could you please help me understand how to get those vectors and some info about what do you mean by Camera view — pavan subhash, Oct 23 '18 at 03:27
in respect to your linked question the position of eye alone will do not much you need also the position of pupils and decide from the skew and relative difference their direction .... — Spektre, Oct 23 '18 at 03:27
@Spektre yes I do have the coordinates of Iris center. How to decide the skew and relative difference ? Please refer me to some documents on the same — pavan subhash, Oct 23 '18 at 03:29
@pavansubhash see [Understanding 4x4 homogenous transform matrices](https://stackoverflow.com/a/28084380/2521214) and the sublinks there for some basic idea... the skew can be decided from the contour of the eye shape (major axises) ... this topic is a bit more complicated and outside my expertiese and also I need to go to work and will be back tomorrow ... — Spektre, Oct 23 '18 at 03:31

score 4 · Answer 1 · edited Jun 20 '20 at 09:12

What you need is:

3D position and direction for each eye

you claim you got it but pitch and yaw are just Euler angles and you need also some reference frame and order of transforms to convert them back into 3D vector. Its better to leave the direction in a vector form (which I suspect you got in the first place). Along with the direction you need th position in 3D in the same coordinate system too...
3D definition of your projection plane

so you need at least start position and 2 basis vectors defining your planar rectangle. Much better is to use 4x4 homogenous transform matrix for this because that allows very easy transform from and in to its local coordinate system...

So I see it like this:

So now its just matter of finding the intersection between rays and plane

P(s) = R0 + s*R
P(t) = L0 + t*L
P(u,v) = P0 + u*U +v*V

Solving this system will lead to acquiring u,v which is also the 2D coordinate inside your plane yo are looking at. Of course because of inaccuracies this will not be solvable algebraicaly. So its better to convert the rays into plane local coordinates and just computing the point on each ray with w=0.0 (making this a simple linear equation with single unknown) and computing average position between one for left eye and the other for right eye (in case they do not align perfectly).

so If R0',R',L0',L' are the converted values in UVW local coordinates then:

R0z' + s*Rz' = 0.0
s = -R0z'/Rz'
// so...
R1 = R0' - R'*R0z'/Rz'
L1 = L0' - L'*L0z'/Lz'
P = 0.5 * (R1 + L1)

Where P is the point you are looking at in the UVW coordinates...

The conversion is done easily according to your notations you either multiply the inverse or direct matrix representing the plane by (R,1),(L,1),(R0,0)(L0,0). The forth coordinate (0,1) just tells if you are transforming vector or point.

Without knowing more about your coordinate systems, data accuracy, and what knowns and unknowns you got is hard to be more specific than this.

If your plane is the camera projection plane than U,V are the x and y axis of the image taken from camera and W is normal to it (direction is just matter of notation).

As you are using camera input which uses a perspective projection I hope your positions and vectors are corrected for it.

I tried few things as per your answer but I am not able to proceed further: (1). I calculated intrinsic and Extrinsic matrix of the camera (However, Focal length seems to change during every trail, not sure why is it?) (2). Converted pitch and yaw to X, Y, Z (3). Converted 2D Iris center from where the gaze is originating to 3D point (Camera Coordinates) using camera matrix. Now I am not able to proceed on how to define the plane assuming the camera is at (0,0,0). My screen is of 42 inch x 40 inch. camera is exactly at the center of the top edge. Please suggest — pavan subhash, Nov 20 '18 at 05:39

Convert Eye Gaze (Pitch and yaw) into screen coordinates (Where the person is looking at?)

1 Answers1