14

I understand mipmapping pretty well. What I do not understand (on a hardware/driver level) is how mipmapping improves the performance of an application (at least this is often claimed). The driver does not know until the fragment shader is executed which mipmap level is going to be accessed, so anyway all mipmap levels need to be present in the VRAM, or am I wrong?

What exactly is causing the performance improvement?

matthias_buehlmann
  • 4,641
  • 6
  • 34
  • 76
  • 3
    What makes you think that something that just occupies some RAM has any influence on the performance? It just sits there, being passive. Only if data is actively fetched from RAM bandwidth is consumed. – datenwolf Oct 01 '13 at 20:02

2 Answers2

22

You are no doubt aware that each texel in the lower LODs of the mip-chain covers a higher percentage of the total texture image area, correct?

When you sample a texture at a distant location the hardware will use a lower LOD. When this happens, the sample neighborhood necessary to resolve minification becomes smaller, so fewer (uncached) fetches are necessary. It is all about the amount of memory that actually has to be fetched during texture sampling, and not the amount of memory occupied (assuming you are not running into texture thrashing).

I think this probably deserves a visual representation, so I will borrow the following diagram from the excellent series of tutorials at arcsynthesis.org.

     

On the left, you see what happens when you naïvely sample at a single LOD all of the time (this diagram is showing linear minification filtering, by the way) and on the right you see what happens with mipmapping. Not only does it improve image quality by more closely matching the fragment's effective size, but because the number of texels in lower mipmap LODs are fewer it can be cached much more efficiently.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Andon M. Coleman
  • 42,359
  • 2
  • 81
  • 106
  • 5
    To elaborate on this point, most high-performance 3D applications are bandwidth-limited. By reducing the bandwidth requirements for texture sampling (by reusing cached texel values), you can speed up the critical path. For applications that are arithmetic or CPU bound, MIP-mapping does not improve performance. – MooseBoys Oct 01 '13 at 23:56
  • So - the performance gain comes solely from cache efficiency? – matthias_buehlmann Oct 03 '13 at 23:36
  • 2
    @user1282931: Yes, that is correct. Have a look here, it basically re-states what I said a little bit more clearly: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0555a/CHDEIJID.html – Andon M. Coleman Oct 04 '13 at 15:37
  • Thanks! One last question - what exactly is 'memory bandwidth'? is it the bandwidth between cpu and Vram? or the bandwidth between Vram and shader units? – matthias_buehlmann Oct 04 '13 at 20:40
  • 1
    @user1282931: Most often memory bandwidth in graphics hardware refers to VRAM -> shader. CPU to VRAM is considered bus traffic in most _high-performance GPU_ (e.g. non-integrated) architectures (think of Xbox 360 and the next generation of game consoles as more of an exception than the rule; they share the same memory interface for CPU and GPU), which has its own set of bandwidth limitations. But it really does refer to anytime memory has to be fetched, whether it comes from cache, VRAM or a system bus like PCIe simply introduces a hierarchical structure of increasingly slower sources. – Andon M. Coleman Oct 04 '13 at 21:17
2

Mipmaps are useful at least for two reasons:

  • visual quality - scenes looks much better in the distance, there is more blur (which is usually better looking than flickering pixels). Additionally Anisotropic filtering can be used that improves visual quality a lot.
  • performance: since for distant objects we can use smaller texture the whole operation should be faster: sometimes the whole mip can be placed in the texture cache. It is called cache coherency.
  • in general mipmaps need only 33% more memory so it is quite low cost for having better quality and a potential performance gain. Note that real performance improvement should be measured for particular scene structure.

see info here: http://www.tomshardware.com/reviews/ati,819-2.html

fen
  • 9,835
  • 5
  • 34
  • 57
  • 3
    This isn't quite right. The reason that "the whole operation [is] faster" is because when you access MIP levels whose texels are appropriately sized to the UV-derivatives of the rasterized pixels (auto-selected MIP level), you can take better advantage of the cache due to spatial locality. If your texels are mismatched (specifically, smaller), each sample is more likely to result in a cache miss, and so require a read from video memory. This applies regardless of the actual size of the MIP level. – MooseBoys Oct 02 '13 at 00:02
  • 1
    added link to discussion from arcsynthesis: http://www.arcsynthesis.org/gltut/Texturing/Tut15%20Performace.html – fen Oct 02 '13 at 06:56