Float vs Double performance in C# considering cache

Question

I checked and could find numerous posts about performance of float vs double (here is one, and here is another). In most cases, it is said that they have the same performance because they are converted by FPU to 10-bye real numbers. But I'm still not convinced. What if the locality issues are considered appropriately? Consider doing bitwise XOR on large number of bits, counting none 0 bits will take considerably less time when the data fits the cache (float). Doing XOR and bit population count with regular (non SIMD instructions) will drive processing time a lot longer. I tried to write some test to confirm it, but it is not easy to get everything right.

One question is does these two types converted to the same size in the cache?

In general, I was wondering if anyone can characterize the behavior of these 2 choices in different situations?

What is the use of using XOR on `float` and `double` values for you? — Steven, May 31 '11 at 13:40
+1 for the interesting question! I've been curious about that too. One thought, though, is: if you're looking for this level of optimization, do you *really* want to do that on C#? — diogoriba, May 31 '11 at 13:42
I also think they have basically the same performance. But if you're still not convinced, build some benchmark and test them. IMO with modern ultra-complex multi-core architectures is almost impossible to anticipate CPU's behaviour without testing... — digEmAll, May 31 '11 at 13:44
Here is MS blog on wrapping fast SSE instructions for floats with managed C++. http://social.msdn.microsoft.com/Forums/en-us/vclanguage/thread/50cd191d-583a-4bda-b9c0-1c05c2fd05bf and also http://www.codeproject.com/KB/recipes/SSE_optimized_2D_vector.aspx — John Alexiou, May 31 '11 at 13:54
The main performance constraint is actually memory consumption for large data sets, which is why you typically see floats rather than doubles used to define mesh vertices and normals in games. I'm not entirely certain if this is true, but I suspect using floats may also allow GPU manufacturers to reduce the number of bits involved in their FPU components, which adds up when you consider that modern GPUs have on the order of 128 cores dedicated largely to parallel matrix multiplications. — Dan Bryant, May 31 '11 at 13:58

score 3 · Accepted Answer · answered May 31 '11 at 13:48

3

What if the locality issues are considered appropriately?

Still the same because those are normally not as high as you think. If you deal with float and double except just copying it, there is some significant time spent actually CALCULATING. Your XOR example is a good example where you think wrong. XOR is a SIMPLE EASY FAST operation, so coherency is important. With floats you spend a lot more time doing the maths in most cases.

answered May 31 '11 at 13:48

TomTom

61,059
10
88
148

Actually, locality issues *can* be **very** important. We had an entire project on this issue when we learned about caches in college, and people managed to optimize algorithms by many gigaflops simply by making data more local. – user541686 May 31 '11 at 13:52
Yes, it can - espeically if you DONT DO ANYTHING. If for example you run mathmematical pricing models for options, then "localit" is so important that moving the sutff off to a graphics card speeds you up by a factor of 100. Locality becomes irrelevant if you spend 1000 cycles calculating. – TomTom May 31 '11 at 14:58
As a test, I created 25 arrays of 1,000,000 random double elements; and another 25 arrays of 1,000,000 random floats. I multiply and divide the first 50,000 elements of each array by two constants. I do this whole operation for each type for 100 times and average the time. The difference is almost 0.00%. I have 8MB cache on my CPU, so I changed the array size to 2 million (16 MB for double, and 8MB for float) and applied my arithmetic operations on 50,000 randomly chosen elements to have 50% chance of hitting cache for double, and almost 100% for float. float has about 1~2% better performance. – DeveloperInToronto May 31 '11 at 15:42
And I assume anything below 5% is error, so overall, no difference – DeveloperInToronto May 31 '11 at 16:18

Float vs Double performance in C# considering cache

1 Answers1