i need to use the modulo operation inside a kernel and it is slowing things down. It is impossible for me to remove it. Basically i have a%b where b is not a power of 2. Is there any way to avoid using it?
Asked
Active
Viewed 900 times
0
-
I'm confused. You state that it is impossible for you to remove but you want to avoid using it? – Hopeful Llama Dec 28 '13 at 18:41
-
1Is `b` a compile-time constant expression? – Kerrek SB Dec 28 '13 at 18:43
-
Had familiar issue on AMD cpu (extremely slow `%` operation). Intel was the solution, really – notnull Dec 28 '13 at 19:03
-
What `arch` are you using? Do you have to use C? Could you use inline asm? – Jonathan Ben-Avraham Dec 28 '13 at 19:21
-
It is likely that your compiler will be using the fastest possible implementation on your platform. The only way to make it faster is if you have some particular constraints you can leverage. – Oliver Charlesworth Dec 28 '13 at 19:24
1 Answers
1
Can you prefetch the answers and use a lookup table? Instead of
c = a%b;
you could then try
c = table[a][b];
Some considerations to signature and tablesize have to be made. Depending on the overall usecase you could move this table to a higher level and remove more that just this single computation.
A custom implementation of modulo would use the definition of it
(a/b)*b + a%b == a; //true
a%b == a - (a/b)*b // true
Depending on the likely values for a and b you could try to optimize this.
Depending on your target hardware you could try to see if there is a speedy hardwaresolution that can solve this for a specific product. (see this)
There may be more solutions out there.
-
The size of this table quickly blows up and stamps all over cache, though. – Oliver Charlesworth Dec 28 '13 at 19:23
-
-
1Yup, particularly for embedded platforms without hardware div/mod, or with "flat" memory hierarchy, and for "small" input ranges. – Oliver Charlesworth Dec 28 '13 at 20:02