PowerVR does use a depth buffer, but in a different way than a regular(Immediate Mode Rendering) GPU
The differed part of Tile-based differed rendering means that triangles for a give scene are first processed (shaded, transformed clipped, etc. ) and saved into an intermediate buffer. Only after the entire scene is processed the tiles are rendered one by one.
Having all the processed triangles in one buffer allows the hardware to perform hidden surface removal - removing the triangles that will end up being hidden/overdrawn by other triangles. This significantly reduces the number of rendered triangles, resulting in improved performance and reduced power consumption.
Hidden surface removal typically uses something called a Tab Buffer as well as a depth buffer. (Both are small on-chip memories as they store a tile at a time)
Not sure why you're saying that PowerVR doesn't use a depth buffer. My guess is that it is just a "marketing" way of saying that there is not need to perform expensive writes and reads from system memory in order to perform depth test.
p.s
Just to add to Tommy's answer: the primary benefits of tile based differed rendering are:
- Since fragments are processed a tile at a time all color/depth/stencil buffer read and writes are performed from a fast on-chip memory. While the color buffer still has to be read/written to system memory ones per tile, in many cases the depth and stencil buffers need to be written to system memory only if it is required for later use(like your user case). System memory traffic is a significant source of power consumption... so you can see how it reduced power consumption.
- Differed rendering enables hidden surface removal. Less rendered triangles means less fragments processing, means less texture memory access.