I have double x
, and double y
. I need to turn that into int boxnum
, which is defined as the (floored) index in which (x,y)
falls in the WIDTH x HEIGHT
grid with squares of size BOX_SIZE
. Coordinates in excess of WIDTH
are wrapped back around; ditto for HEIGHT
.
I am currently using:
( (((int)(x))/BOX_SIZE)%WIDTH+ WIDTH*((((int)(y))/BOX_SIZE)%HEIGHT) )
This statement currently eats something like 20% of my execution time, and that gets even worse (on the order of 40-50%) if I make it completely safe for negative coordinates:
( (( ((int)(x)) /BOX_SIZE)%WIDTH+WIDTH)%WIDTH
+WIDTH*(( (((int)(y)) /BOX_SIZE)%HEIGHT+HEIGHT)%HEIGHT) )
I'm actually considering completely converting the application to fixed-point, just to avoid this, so that I can bitmask out the part I want instead of having this horrible conversion.
Is there a better way of doing a double->int conversion of this kind? Would it be worth it to ensure that 0<x<WIDTH*BOX_SIZE
and 0<y<HEIGHT*BOX_SIZE
so I can drop the two remainder operations? (doing so is difficult enough to not be worth while for the benchmark, unless it is likely to be a significant improvement)
EDIT: After appropriate chastisement in the comments, more details:
x
and y
are the coordinates of a set of (as many as 10^6) particles. I am using an algorithm which requires me to, at every time step, do some simple sums across all particles within a box. Thus, I loop across particles, calculate which box the particle is in, and then use that as the array index for adding to that box. Particles often move far enough that their past locations is no indication of their future ones. The are also unordered, which means I can make no assumptions about this.
WIDTH
, HEIGHT
, and BOX_SIZE
are technically free, as long as WIDTH
and HEIGHT
are even multiples of BOX_SIZE
. In practice they are all specified compile time, and are integers with BOX_SIZE=1
. I have run everything from WIDTH=HEIGHT=4
to WIDTH=HEIGHT=512
, and while I'm usually on a square power of 2 (because why not?), WIDTH=37;HEIGHT=193
should work without issues.
This calculation is unavoidable performed once per particle per timestep; in current implementation it is performed twice. I tried caching the value to avoid recalculation, but the end benchmark performed worse, so I went back to calculating it twice.
A basic test run with 10 particles/box * 100 WIDTH * 100 HEIGHT* 10000 steps = 1 billion particle*timesteps
runs in a shade over a minute.
These coordinates are on the order of their "regular numbers" (1-1000), so I'm nowhere near any kind of bound on double
.