Assuming that this is a pinhole camera model, you also need the width and height of the camera image sensor (CCD) - let's call these widthCCD and heightCCD respectively.
You need to do two steps:
- Figure out the physical 3D projection (physical point onto the camera sensor CCD)
- Figure out the image projection (CCD into image pixel space)
Let's assume you have two pixels on the image (u1, v1) and (u2, v2). These two pixels map to the following pixels on the CCD sensor (uc1, vc1) and (uc2, vc2). And finally, those two CCD pixels map to the following physical 3D coordinates (X1, Y1, Z1) and (X2, Y2, Z2) as follows:
Note: Z1 = Z2 = d = 0.7 (based on your information provided)
Physical Projection (3D to CCD):
(uc1, vc1) -> (fx * X1/Z1, fy * Y1/Z1)
(uc2, vc2) -> (fx * X2/Z2, fy * Y2/Z2)
Image Projection (CCD to Image):
(u1, v1) -> (uc1 * width/widthCCD + cx, vc1 * height/heightCCD + cy)
(u2, v2) -> (uc2 * width/widthCCD + cx, vc2 * height/heightCCD + cy)
By applying substitution you can arrive at:
(u1, v1) -> ((fx * X1/Z1) * width/widthCCD + cx, (fy * Y1/Z1) * height/heightCCD + cy)
(u2, v2) -> ((fx * X2/Z2) * width/widthCCD + cx, (fy * Y2/Z2) * height/heightCCD + cy)
Since I don't know the CCD sensor height and width, I will just assume that the CCD height and width is the same as the image, for this example:
- Plug in Z1 = Z2 = d:
(u1, v1) -> ((fx * X1/d) + cx, (fy * Y1/d) + cy)
(u2, v2) -> ((fx * X2/d) + cx, (fy * Y2/d) + cy)
- Now lets find the physical distance between two pixels: (0, 0) and (168, 39)
(u, v) -> ((1.06935339e+03 * X/0.7) + 6.29035115e+02, (1.07107059e+03 * Y/0.7) + 3.54614962e+02)
For (0, 0), X = -0.41, Y = -0.23
For (168, 39), X = -0.30, Y = -0.21
- Find Euclidean distance between the two 3D points:
Distance between (-0.41, -0.23) -> (-0.30, -0.21) = 0.11m
So the physical distance between the two points, assuming the CCD sensor is the same as the image plane, is 0.11 meters.