Typically you have the following:
- N image layers with each image dimension = width * height
- Interpretation from a color value to some "thickness value"
The general idea is:
Create a 3D map of dimension N * width * height with either floating point or byte values. Then just add your image layers to that map, giving you something like a huge 3D texture. Now you can define the tissue-thickness you are interested in, for example bones. Then search each cell in your 3D map where the values differ from "less than bone-thickness" to "bigger or equal to bone-thickness" (or just cells which have the exact thickness value stored) and mark those cells as "bones". Then you have some voxel grid of your bones :)
A better approach is to use something like marching cubes
and interpolate between thickness changes.
Probably, if you google "marching cubes" and "x-ray" you'll find some more detailed information (and university lecture notes) about different ways to solve the approach. For example: http://www.eecs.berkeley.edu/~jrs/meshpapers/LorensenCline.pdf and from those papers you might find more tags to search for.