To completely determine a scene 3D data from a given image, you need to know the perspective projection parameters that formed your image. They are:
Detailed:
Focal length can be obtained from viewing angle, with this formula: fx = imageWidth/(2*tan(alphaX)), and similar for the other dimension. If you have neither fx, nor aperture, you cannot reconstruct your 3D image.
Another way to extract them is to calibrate the camera. See http://opencv.itseez.com/modules/calib3d/doc/calib3d.html, but it seems you cannot use it (you said you don't have access to camera.)
The vanishing point (VP) is used to determine the angles at which camera was orientated. So, a difference between the image center and the VP gives you the rotation info:
yaw = ((camera.center.x (pixels) - VP.x)/image.x )* aperture.
pitch = similar.
The roll angle cannot be extracted from VP, but from the horizon line.
The last parameters you need are the translations. Depending of the application, you can set them all 0, or consider only the height as being relevant. None of them can be usually recovered from the image.
Now, having all this data, you can look here
Opencv virtually camera rotating/translating for bird's eye view
to see how all this measures influence your perspective correction.