Optimization of Point to Voxel mapping

Question

I used a profiler to look over some code which does not yet run fast enough. It found that the following function took most of the time, and half of the time in this function was spent in floor. Now, there are two possibilities: optimizing this function or going one level above and reducing the calls to this function. I wonder, if the first one is possible.

int Sph::gridIndex (Vector3 position) const {
    int mx = ((int)floor(position.x / _gridIntervalSize) % _gridSize);
    int my = ((int)floor(position.y / _gridIntervalSize) % _gridSize);
    int mz = ((int)floor(position.z / _gridIntervalSize) % _gridSize);

    if (mx < 0) {
        mx += _gridSize;
    }
    if (my < 0) {
        my += _gridSize;
    }
    if (mz < 0) {
        mz += _gridSize;
    }

    int x = mx * _gridSize * _gridSize;
    int y = my * _gridSize;
    int z = mz * 1;
    return x + y + z;
}

Vector3 is just some simple class which stores three floats and provides some overloaded operators. _gridSize is of type int and _gridIntervalSize is a float. There are _gridSize ^ 3 buckets.

The purpose of the function is to provide hash table support. Every 3d-point is mapped to an index, and points which lie in the same voxel of size _gridIntervalSize ^ 3 should land in the same bucket.

This question overlaps with http://stackoverflow.com/questions/650175/good-way-to-hash-a-float-vector — Frerich Raabe, Dec 08 '10 at 11:34

score 3 · Accepted Answer · answered Dec 08 '10 at 12:10

3

First rule of optimization when there is math involved: Eliminate division, square roots, and trig functions.

inverse_size = 1 / _gridIntervalSize; ....that should be done only once, not once per call.

int mx = ((int)floor(position.x * inverse_size) % _gridSize);
int my = ((int)floor(position.y * inverse_size) % _gridSize);
int mz = ((int)floor(position.z * inverse_size) % _gridSize);

I would also recommend dropping the mod operation because that's another division - if your grid size is a power of 2 you can use & (gridsize-1) which will also allow you to delete the conditional code at the bottom which is another big savings.

On another note, using overloaded operators may be hurting you. This is a touchy subject here so I'll let you experiment with it and decide for yourself.

answered Dec 08 '10 at 12:10

phkahler

5,687
1
23
31

x, y and z are only public members which store a float. Therefore, there are no overloaded operators at this point. – Etan Dec 09 '10 at 10:39
1

Also, most compilers will automatically convert division by a constant to multiplication by a constant if there is any value to doing so. This even applies for things that are only constant for the loop, though that will only apply if this function gets inlined into the loop. Still, since this is a problem case, the experiment is worthwhile. If grid size is a constant power of 2, the `%` should be optimised to a bitwise and. If not, it could be worth imposing that power-of-two rule and using bitwise and directly. – Dec 09 '10 at 11:49
1

@Steve - according to K&R, the sign of the % operator is machine dependent. So using it in place of & would not be portable. Also, as long as gridsize is a variable that optimization won't be possible at compile time. I guess I prefer to make those optimizations manually rather than let the compiler do it. – phkahler Dec 10 '10 at 00:37
for the compiler optimisation, it would be using & for % rather than visa versa, and some compilers (IMO notably GCC) will do optimisations that change behaviour when that behaviour is undefined by the standard. It probably is a relatively unlikely case, though, now you mention it - most likely only compilers that already use the modulo (never negative) semantics will use the optimisation automatically, but I believe most use the remainder semantics to match truncating division (meaning results can be negative). – Dec 10 '10 at 11:15

score 1 · Answer 2 · 2010-12-09T12:29:47.367

I assume you use floor because negative values are possible, and because you don't want an anomaly due to the default truncation when you cast to int (values rounding toward zero from both sides, making some oversized voxels).

If you can specify a safe most-negative value for each value in the vector, you could subtract that (negative) value, or rather the nearest more-negative multiple of _gridIntervalSize, before the cast, and drop the floor.

Using fmod may ensure you have a safe most-negative value, and replace the integer %, but it's probably an anti-optimisation. Still, as a quick change, it may be worth checking.

Also, check whether your platform supports vector instructions, and whether your compiler can easily be encouraged to use them. x86 chips certainly have integer vector instructions as well as float (the old Pentium 1 MMX instructions, for a start) and might be able to handle this much more efficiently than the "normal" CPU instruction set. This may even be a case for digging out the list of vector instruction intrinsics for your compiler and doing some hand-optimisation. Just check what the compiler can do for you first - I'm not sure how much of this kind of optimisation compilers will do for you already.

One probably trivial piece of micro-optimisation...

return (mx * _gridSize + my) * _gridSize + mz;

Saves one integer multiplication. Trivial, of course, and the compiler may catch it anyway, but this is an old habitual thing.

Oh - watch the leading underscores. Those are reserved identifiers. Not likely to cause a problem, but you can't complain if they do.

EDIT

Another way to avoid the floor is to handle positive and negative separately. If you are willing to accept that items bang-on-the-edge of a grid cell may be in the wrong cell (possible anyway since floats should be considered approximate). Just apply a -1 offset in the negative case, to pull it away from the zero by almost exactly right amount to compensate for the truncation. You might consider a bit-fiddling increment-the-mantissa afterwards (to get already integer values in the cell you'd expect) but this is probably unnecessary.

If you can impose power-of-two limitations to your sizes, there may be a bit-fiddling way to efficiently extract the grid position from a float, avoiding some or all of the multiply, floor and % for each of x, y and z, assuming a standard floating point representation (ie this is non-portable). Again, handle positive and negative separately. Extract the exponent, bit-shift the mantissa accordingly, then mask out unwanted bits.

sadly, the grid is infinite in all directions. So, there is no safe most-negative value. This is also the reason why I use the hashing method. — Etan, Dec 08 '10 at 12:01
@Etan - some extra ideas added. I'd class the second in particular as extreme measures. — , Dec 09 '10 at 12:32

score 0 · Answer 3 · answered Dec 08 '10 at 13:11

I think you need to look higher up the hierarchy to get real speed improvements. That is, is storing points in a hash-map really the most efficent solution? I assume you have an array of Vector3 arrays, i.e:

Vector3 *points [size][size][size]

where each element in the 3D array is an array of Vector3.

The algorithm you're using doesn't guarantee uniform distribution of points in each Vector3 array, which may be a problem. A cluster of points within _gridIntervalSize will map to the same array.

An alternative method would be to use oct-trees, which are like binary trees but each node has eight child nodes. Each node requires the min/max x/y/z values to define the volume the node covers. To add values to the tree:

Recursive search tree to find smallest node that can contain point

Add point to node

If number of points in node > upper limit to number of points in a node

Create child nodes and move points to child nodes

You may want to use quad-trees if there is little variation in values along a particular axis. Another method is to use BSPs - divide the world into two halves and recurse to find the container to add your point to. Again, these can be dynamic.

Converting the floats to ints and having the division planes lie on integer values will speed up the process as well.

Googling the above terms will lead you to more in depth analysis of the algorithms.

Finally, using floats (or doubles) for co-ordinates in an infinite plane is a bad idea - the further you get from (0,0,0) the less precision you have (the gaps between floating point values increases as the value increases). You will need to 'reset' the floating point values to keep the precision. One method is to 'tile' the space and change the co-ordinates to use integer and floating point parts. The integer part defines the 'tile' and the floating point part defines the position in the tile. This method gets you a much simpler hashing method - just use the integer parts, no call to floor required and only integer calculations required. Another approach is to use fixed-point values rather than floating point values, but this would constrain your precision. This would make calculations accross tile boundaries much easier.

If you could expand on what the top-level requriements of your coordinate system is, there are probably better algorithms available to you.

Since the points are moving (it's a SPH particle system), KD-trees or oct-trees are too expensive to build. With the voxel system, you can move points quite easily between the voxels. — Etan, Dec 08 '10 at 14:37
@Etan: You might be able to move points between voxels easily, but the rest of the processing is not effiecent otherwise you wouldn't have asked this question. There are lots of resources about trees and how to efficently build and modify them. You should easily make up for the cost of tree maintenance through much simpler processing overall. A retro example: Wolf3D vs Doom - Wolf had a grid (2D voxel) which was easy to edit/build whereas Doom had complex BSP, overall Doom was far more efficent at rendering.. — Skizz, Dec 08 '10 at 15:42
but Doom and Wolf3D levels are fixed, and building the BSP for a level was described (articles relating to Quake in Dr. Dobbs, IIRC) as taking hours. Machines have got faster, but that's still an obvious issue - though probably that was a search for a most efficient possible tree, rather than accept a quicker but unbalanced solution. Anyway, while Octrees are more efficiently mutable, it can still be unbalanced. Unbalanced trees cause performance issues just like hash collisions, and a well designed hash-based solution is still likely to be faster. — , Dec 09 '10 at 12:16

Optimization of Point to Voxel mapping

3 Answers3