In many cases, 3d hidden surface removal means that when you draw a surface, you also remember the depth of each pixel that you draw (distance from the 'eye'). When you go to draw a surface where a surface has already been drawn, you only draw the pixel if it's closer to the eye than the pixel that's already there. In the 3d graphics library, OpenGL (projects 3d scene descriptions to a 2d display), this is called the Depth Buffer Test.
You might also keep track of which direction the surface is facing. If it faces away from the 'eye', then don't draw it at all. In OpenGL, this is call backface culling. If you want to create the layered transparency look that your article describes, then you have to sort the pixels according to depth and draw the deepest ones first. Then draw the nearer ones on top, replacing the current pixel with a convex combination of the old pixel color and the current surface's color.
For your 4d case, you need to decide what it means to project a 4d thing. I think that any 3d model that's also animated is actually a 4d model. You could, if you wanted, draw all of the frames at once as with this simulated racquetball player:

In this image, time is represented by grayscale value here. So you can see the player move across the court and hit the ball, all in a single static image. Of course, a proper 4d to 2d projection probably wouldn't display a bunch of discrete time frames drawn together, but would rather connect the vertices that bridge time so that instead of seeing a bunch of individual balls, you would see a 'tube' that represents the ball's trajectory. But, kind of like is mentioned in the article you link, for some cases, that might communicate less.