Estimating distance to a point using camera calibration

Question

I want to estimate distance (camera to a point in the ground : that means Yw=0) from a given pixel coordinate of that point . For that I used camera calibration methods

But the results are not meaningful.

I have following details to calibration

-focal length x and y , principal point x and y, effective pixel size in meters , yaw and pitch angles and camera heights etc.

-I have entered focal length ,principal points and translation vector in terms of pixels for calculation

-I have multiplied image point with camera_matrix and then rotational| translation matrix (R|t), to get the world point.

Is my procedure correct?? What can be wrong ?

result
image_point(x,y) =400,380
world_point z co ordinate(distance) = 12.53

image_point(x,y) =400,180
world_point z co ordinate(distance) = 5.93

problem I am getting very few pixels for z coordinate , That means z co ordinate is << 1 m , (because effective pixel size in meters = 10 ^-5 )

This is my matlab code

%positive downward pitch
xR = 0.033;
yR = 0;
zR = pi;



%effective pixel size in meters = 10 ^-5 ; focal_length x & y = 0.012 m 
% principal point x & y = 320 and 240 
intrinsic_params =[1200,0,320;0,1200,240;0,0,1];
Rx=[1,0,0 ; 0,cos(xR),sin(xR); 0,-sin(xR),cos(xR)];
Ry=[cos(yR),0,-sin(yR) ; 0,1,0 ; sin(yR),0,cos(yR)];
Rz=[cos(zR),sin(zR),0 ; -sin(zR),cos(zR),0 ; 0,0,1];

R= Rx * Ry * Rz ;


% The camera is 1.17m above the ground
t=[0;117000;0];

extrinsic_params = horzcat(R,t);

% extrinsic_params is  3 *4  matrix

P = intrinsic_params * extrinsic_params; % P 3*4 matrix

% make it square ....
P_sq = [P; 0,0,0,1];

%image size is 640 x 480
%An arbitrary pixel 360,440 is entered as input

image_point = [400,380,0,1]; 
% world point will be in the form X Y Z 1 
world_point =    P_sq *  image_point'

Welcome to SO. Although you have described your procedure, your Matlab code would be easier to use in an answer, because there is less chance of misunderstanding. Also, could you clarify "not meaningful" - perhaps give an example input and expected output? — Neil Slater, Jan 19 '15 at 15:02
Your procedure looks correct, so you shoudl be making a mistake somewhere probbly. AsNeil proposed, post your code with explanations part by part, as that will be easier to debug. — Ander Biguri, Jan 19 '15 at 15:43

score 1 · Answer 1 · edited May 23 '17 at 10:25

1

Your procedure is kind of right, however it is going in the wrong direction. See this link. Using your intrinsic and extrinsic calibration matrix you can find the pixel-space position of a real-world vector, NOT the other way around. The exception to this is if your camera is stationary in the global frame and you have the Z position of the feature in the global space.

Stationary camera, known feature Z case: (see also this link)

%% First we simulate a camera feature measurement
K = [0.5 0 320;
    0 0.5 240;
    0 0 1]; % Example intrinsics
R = rotx(0)*roty(0)*rotz(pi/4); % orientation of camera in global frame
c = [1; 1; 1]; %Pos camera in global frame

rwPt = [ 10; 10; 5]; %position of a feature in global frame
imPtH = K*R*(rwPt - c); %Homogeneous image point
imPt = imPtH(1:2)/imPtH(3) %Actual image point


%% Now we use the simulated image point imPt and the knowledge of the
% features Z coordinate to determine the features X and Y coordinates
%% First determine the scaling term lambda
imPtH2 = [imPt; 1]; 
z = R.' * inv(K) * imPtH2;
lambda = (rwPt(3)-c(3))/z(3);

%% Now the RW position of the feature is:
rwPt2 = c + lambda*R.' * inv(K) * imPtH2 % Reconstructed RW point

Non-stationary camera case:

To find the real-world position or distance from the camera to a particular feature (given on the image plane) you have to employ some method of reconstructing the 3D data from the 2D image.

The two that come to mind immediately is opencv's solvePnP and stereo-vision depth estimation. solvePnP requires 4 co-planar (in RW space) features to be available in the image, and the positions of the features in RW space known. This may not sound useful as you need to know the RW position of the features, but you can simply define the 4 features with a known offset rather than a position in the global frame - the result will be the relative position of the camera in the frame the features are defined in. solvePnP gives very accurate pose estimation of the camera. See my example.

Stero vision depth estimation requires the same feature to be found in two spatially-separate images and the transformation between the images in RW space must be known very precisely.

There may be other methods but these are the two I am familiar with.

edited May 23 '17 at 10:25

Community

1
1

answered Jan 19 '15 at 23:23

Gouda

1,005
1
10
19

thanks .My application is estimating distance to pedestrians and I will consider only the distance to the bottom of the of the bounding box of the detected pedestrian . I assume the world is flat.With these assumptions and requirements, is it not possible to get the distance by a simple matrix multiplication as in my code ? – Sachini Jan 20 '15 at 07:17
I have added example code of transforming from 2D->3D points given that your intrinsics and extrinsics are known, and your camera is stationary in the global frame. I had assumed your camera was moving and so this method would not work, however as I now understand it your camera is stationary and you can assume the bounding box is at a given height (from the pedestrian's feet being on the ground). Apologies for the misunderstanding. – Gouda Jan 21 '15 at 02:29
The camera is fixed to a moving vehicle. I'm not considering pedestrians height. I consider only the bottom mid point of the bounding box (point where pedestrian touches the ground). My requirement is getting distance to pedestrian to avoid collision. I do not know the real world position of the pedestrian. I want to find it by just using the monocular image. Is this impossible when I know only the camera intrinsic and extrinsic? Is it necessary to use stereo vision? – Sachini Jan 21 '15 at 08:21
You only need to know the _Z component_ of the RW position of the _feature_, which in your case is the pedestrian's feet if I understood you correctly. Unless the pedestrian is jumping/flying that Z component will be very close to constant. As soon as your camera moves from the calibration pose, the extrinsic matrix is no longer valid. So you cannot use the method I described in the first half of my answer, unless you have an accurate pose of the vehicle that holds the camera. If you already have a feature detector working then implementing Stereo Vision depth estimation would be very simple. – Gouda Jan 21 '15 at 08:48
My understanding is that if the world is assumed flat, Z component can be assumed 0 (pedestrian is not jumping/flying). I know the height to the camera from the ground and the camera pitch. Is this information not enough for estimating distance from monocular vision ? I do not feel confident about this method either. – Sachini Jan 21 '15 at 10:19
You cannot solve your problem with this method unless you know the camera extrinsics at every time your camera captures an image. You can do this by knowing the pose of the vehicle and the translation between the vehicle body frame and the camera frame. – Gouda Jan 21 '15 at 12:19
Thank you very much. My original idea was to use stereo vision. But my instructor ask me to do it using monocular vision. I will try to explain the feasibility issue to him and use stereo vision. – Sachini Jan 21 '15 at 16:55

Estimating distance to a point using camera calibration

1 Answers1