I am completely new to CUDA programming and I want to make sure I understand some basic memory related principles, as I was a little confused with some of my thoughts.
I am working on simulation, using billions of one time use random numbers in range of 0 to 50.
After cuRand fill a huge array with random 0.0 to 1.0 floats, I run a kernel that convert all this float data to desired integer range. From what I learned I had a feeling that storing 5 these values on one unsigned int by using just 6 bits is better because of very low bandwidth of global memory. So I did it.
Now I have to store somewhere around 20000 read-only yes/no values, that will be accessed randomly with let's say the same probability based on random values going to the simulation.
First I thought about shared memory. Using one bit looked great until I realized that the more information in one bank, the more collisions there will be. So the solution seems to be use of unsigned short (2Byte->40KB total) to represent one yes/no information, using maximum of available memory and so minimizing probabilities of reading the same bank by different threads.
Another thought came from using constant memory and L1 cache. Here, from what I learned, the approach would be exactly opposite from shared memory. Reading the same location by more threads is now desirable, so putting 32 values on one 4B bank is now optimal.
So based on overall probabilities I should decide between shared memory and cache, with shared location probably being better as with so many yes/no values and just tens or hundreds of threads per block there will not be many bank collisions.
But am I right with my general understanding of the problem? Are the processors really that fast compared to memory that optimizing memory accesses is crucial and thoughts about extra instructions when extracting data by operations like <<, >>, |=, &= is not important?
Sorry for a wall of text, there is million of ways I can make the simulation work, but just a few ways of making it the right way. I just don't want to repeat some stupid mistake again and again only because I understand something badly.