1

I'm observing a rather weird phenomenon: when I increase the amount of CPU computations required from 10+ million to few hundred millions (most are multiplication and additions and divisions), if I compute them in float, the speed turns out to be much faster. However, for operations below a certain amount that is not so extreme, integer computation is indeed faster, as expected.

Is there a particular reason why this happens? I'm suspecting it might have to do with float operations getting parallelized automatically when the computations increase significantly, but not for integer computation. Note that I did not explicitly perform multi-threading for the application. I'm no expert on Android, so I'm wondering if any android pro or computer architecture expert could enlighten me on this.

Thank you.

kwotsin
  • 2,882
  • 9
  • 35
  • 62
  • 3
    what operations you're do? do you divide? if so it's clear, see: https://stackoverflow.com/questions/3350808/int-vs-float-arithmetic-efficiency-in-java – AsfK Aug 30 '17 at 15:38
  • Yes division is involved as well. In fact, the application is related to neural nets, which mean essentially graph computations as in the thread you posted. Is there existing literature showing how some operations in integer (e.g. division) is slower? – kwotsin Aug 30 '17 at 15:43
  • I never tested it, but you can test it easily in simple java code (using timestamps). **anyway* I find that http://nicolas.limare.net/pro/notes/2014/12/12_arit_speed/ , very interesting info :) In additional, use GPU should be fast the process.. – AsfK Aug 30 '17 at 15:48
  • 2
    Division is always the slowest operation ever, for a processor. To optimize, you can *multiply by the precalculated inverse*. I.e.: `10 * .5` is **much faster** than `10 / 2` and the result is the same (5). Even more optimized, is *shifting the bits* (if you are to multiply or divide by a power of 2). I.e.: `10 << 1` is **MUUUUUCH FASTER** than `10 * 2` and the result is the same (20). – Phantômaxx Aug 30 '17 at 15:53
  • 1
    @ModularSynth While both of those are true, the compiler should do both of those optimizations for you. Also, the shifting thing is only true if you are multiplying by a constant, and the constant has relatively few bits. If the number is a variable or has too many 1 bits, the hardware multiplier will be faster. – Gabe Sechan Aug 30 '17 at 15:57

1 Answers1

2

Processors these days have built in parallel float instructions (called vector instructions). IF you're doing a lot of fp operations, it could be optimizing you to those. See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/a64_simd_vector_alpha.html for a list of the built in CPU operations.

Gabe Sechan
  • 90,003
  • 9
  • 87
  • 127
  • Thanks for sharing this!!! I am indeed using an ARM processor from snapdragon, and this is something I have never really thought of. Is there a way to check if my mobile device is using SIMD operations by default? – kwotsin Aug 30 '17 at 15:47
  • IF its C/C++ code, you can look at the disassembly in the object files and check what instructions were generated. If its Java, you're pretty much out of luck unless you want to study how the JIT environment of the device compiles bytecode to machine code. – Gabe Sechan Aug 30 '17 at 15:48
  • Unfortunately I'm using Java instead so it will be tough. I just checked the snapdragon processors seem to actively use SIMD vector operations even from more than 5 years back. Also, is there a particular reason why integer operations can't be vectorized? – kwotsin Aug 30 '17 at 16:00
  • 1
    They can be. Intel has SSE and AVX, which are both instruction sets for vectorized integers. But the most common usecase for vectorized math is graphics, which use floating point. In addition most integer code includes branching as flow control, so cannot be parallelized well. So its not a high priority for processor manufacturers. – Gabe Sechan Aug 30 '17 at 16:06