One of the most authoritative books on this is Zisserman.
The (very) high level steps are:
- For each image, identify a set of features (e.g. SIFT, ORB, SURF or similar)
- For a pair of images, find correspondences between the features located in 1
- Using these correspondences, calculate the Fundamental matrix F which described the relationship between the two images
- From F (and ideally some internal camera parameters) , calculate the relative camera pose in the second image with respect to the first
- Using F, rectify image pairs such that corresponding points lie on the same scan lines
- Calculate a dense disparity map using the two images and then convert this to a depth map for each pixel in an image
- Given the camera pose, and depth of each pixel, back project points 3D point in space
- Use some technique such as bundle adjustment to optimise the derived values of 3D coordinates and camera poses across the entire set of calculated points and images
The OpenCV calib3d
library contains many useful functions for this process and Google is your friend for more details on how to apply them.
Also see: OpenCV with stereo 3D reconstruction, OpenCV 3D reconstruction using shipped images and examples