How does SLI decide which VBO belongs to which GPU in OpenGL?

Question

I am aiming to use SLI in AFR mode to increase FPS. I am under the impression that NVIDIA SLI driver will allocate the VBOs automatically and intelligently to individual GPUs. Is this correct?

I have a code that has a large amount of vertices/faces represented by VAO with three different VBOs (vertices, color, indices). There is no fps increase with using double GPU with SLI.

I duplicate the VAO and VBOs with the same vertices/faces and alternate the glDrawElements call between the two VAOs hoping the NVIDIA SLI driver will be clever enough to know one VAO is for one GPU, but unfortunately still no fps increase. Can someone let me know what I did wrong?

I also tried commenting out one of the glDrawElements call for one of the VAO, and it does show double FPS and flickering scene with the actual scene and black screen as expected.

What framerate figures do you get? If you're drawing very simple geometry, then you're probably capped out by the CPU and not the GPU. Also SLI can play its muscles only if you have a fairly complex scene where rendering a single frame takes longer for one GPU than one monitor refresh cycle takes. — datenwolf, Apr 13 '15 at 11:18
@datenwolf the fps I am getting is around 25~30, the scene is from a ply file and I am drawing as it is. There are a lot of vertices/faces but no lighting/shadows etc so I am not sure if this is "complex". One thing I am sure is if I only load half of the vertices/faces the fps doubled. Shouldn't SLI help in this kind of scenario? — user3667089, Apr 13 '15 at 16:39

Dimo Markov · Answer 1 · 2015-04-21T18:18:10.290

1

As mentioned here,

It's noteworthy that while the frequency at which frames arrive may be doubled, the time to produce the frame is not reduced

Additionally, I have never heard about VAOs or VBOs that has been dedicated to one GPU. As far as I know, both adapters have the same cloned buffers. Duplication happens without you even knowing it, and each GPU uses its own copy to produce its part of the frame. I may be wrong, but I doubt it.

That is the reason why if you have 2x2 GB VRAM adapters, you don't get 4GB VRAM. You are still working with 2 GB. Also, if your SLI adapters are of different capacity, the bigger card's memory is lowered to align with the smaller. All the performance boost that you get is from the parallel processing power of the two GPUs, and the fact that your memory bandwidth is twice as big. Memory writes are hardware multicasted, as far as I know, so no big overhead there.

EDIT: Read these interesting points about SFR and AFR. Turns out that AFR is recommended for heavy vertex load, while SFR is better for pixel shader load. That was an interesting find even for me. When using AFR, you should also make sure that you're double buffered to get most out of it. Lack of multiple buffers literally kills AFR. Turn your vsync OFF - it kills it too!

edited Apr 21 '15 at 18:18

answered Apr 14 '15 at 09:29

Dimo Markov

422
2
9

I switched to SFR mode and there is no performance boost. Are you suggesting the fps might be limited by the memory bandwidth? It took a long time to copy the data from CPU memory to VBO, but this is a one time operation. Could you elaborate on why the fps is limited by the memory bandwidth? – user3667089 Apr 18 '15 at 04:37
1

Are you still "duplicating the VAO and VBOs with the same vertices/faces and alternate the glDrawElements call between the two VAOs hoping the NVIDIA SLI driver will be clever enough to know one VAO is for one GPU" ? – Dimo Markov Apr 19 '15 at 18:45
I tried both non-duplicate and duplicate, and the FPS is still the same – user3667089 Apr 19 '15 at 19:50
1

Have you considered checking your FPS calculation code, process priority settings, sleep cap, etc? If that's not the case, I guess your biggest bottleneck is not rendering, but memory throutput. You should consider compressing your vertices. What kind of geometry are you rendering? Is it confined to some constant range? Do you need colors? How many indices are there? – Dimo Markov Apr 20 '15 at 08:33
Is the memory bandwidth limitation you are referring to the transfer between VBO to the screen (is there a separate thing such as display memory)? The memory bandwidth of the GPU I am using is 336 GB/sec. There are about 50M indices with colors. I know it's an insane amount and could possibly use Buffer Object Streaming to deal with this, but I would like to know if using better hardware or SLI will simply solve it. – user3667089 Apr 21 '15 at 06:02
1

What does 'indices with colors' mean? Additionally, 50 million indices take up as much as 190 MB of VRAM, which is not that fatal. How many vertices do you have? Do you have colors, texture coordinates, normals, etc. associated with these vertices. I doubt that anything other than better hardware will be able to take on big data. However, you haven't even mentioned about what amount of data you actually employ. – Dimo Markov Apr 21 '15 at 12:27
1

When dealing with graphical optimization, there are three things you need to do: 1. precompute, 2. compress, 3. use all hardware capabilities that you can sqeeze out. I don't know if SLI would work, because there are too many conditions. Are you on a laptop? Do you have SLI turned on from your nVidia Control Panel? Are you equipped with two identical adapters? Are you on High Performance power mode? Are you allocating more VRAM than you have? – Dimo Markov Apr 21 '15 at 12:31
1

I have 25M vertices and 50M indices, each vertex have a 8x3 bit vertex color. No textures no normals. I am simply loading data from a ply file and display it. I am on a desktop, SLI is turned on, equipped with two GTX titan black, on high performance power mode, there are 6GB of VRAM in each and GPU-Z shows I am not using VRAM over the limit. – user3667089 Apr 21 '15 at 16:54
1

Thanks for the info. Seems you're okay with memory. [Read these interesting points about SFR and AFR](http://http.download.nvidia.com/developer/presentations/2005/GDC/OpenGL_Day/OpenGL_SLI.pdf). Turns out that AFR is recommended for heavy vertex load, while SFR is better for pixel shader load. That was an interesting find even for me. When using AFR, you should also make sure that you're double (or even more?) buffered to get most out of it. Lack of multiple buffers literally kills AFR. Turn your vsync OFF - it kills it too! – Dimo Markov Apr 21 '15 at 18:01

How does SLI decide which VBO belongs to which GPU in OpenGL?

1 Answers1