The actual physics of a lens are explained for example on this website of Georgia State University.
See this illustration which explains how you can use either the linear magnification or focal length relations to find out object size from image size:

In particular, -i / h' = o / h
, and this relation o / h
holds true for all similar triangles (that is, an object of size 2h
at distance 2o
has the same sizeh'
on the picture). So as you can see, even in the case of the full equation, you can't know both the distance o
and the size h
of an object -- however one will give you the other.
On the other hand, two objects at the same distance o
will see their sizes h1'
and h2'
on the image be proportional to their sizes in real life h1
and h2
, since h1' / h1 = M = h2' / h2
.
Hence if you know for one object both o
and h
, you know M
, thus knowing an object's size on film you can deduct its size from its distance and vice versa.
The -i / h'
value is naturally expressed for the maximal h'
. If the size of an object fills the image exactly, it fills the field of view, then the ratio of its distance to its size is tan(α/2) = (l / 2) / d
(note that in the conventions of the image below, d = o
and l = 2 * h
).

This α is what you name theta in your example. Now, from the image size you can get under what angle you see the image -- that is, what size l
would the image be if it were at distance d
. From there, you can deduce the size of the object from its distance and vice versa.
Algorithm steps:
- get ratio
r = size of object in image (in px) / total size of image (in px)
.
Do this along the axis for which you know or plan to get the real object size, of course.
- get the corresponding field of view and angle multiply r by the tangent of half that angle
r *= tan(camera.getParameters().getXXXXViewAngle() / 2)
r
is now the tangent of the half-angle under which you see the object, hence the following relations are true: r = (l / 2) / d = h / o
(with the respective drawing's notations).
- If you know the distance
d
to the object, its size is l = 2 * r * d
- If you know the size
l
of the object, it is at distance is d = l / (2 * r)
This works for objects hat are actually pointed at by the camera, if they aren't centred the maths may be off.