2

My 3D game world is stored in a flat array of bytes. The world is 1024 x 1024 in width/length and 256 high. So the array is 268435456 bytes long. To access the array I use an index pointer calculated like so :

WORLD_WIDTH_SHIFT = 10;
WORLD_HEIGHT_SHIFT = 8;

indexPointer = (((Px << WORLD_WIDTH_SHIFT) + Pz) << WORLD_HEIGHT_SHIFT) + Py;

I navigate around the array during processing by adding predefined constants to the index value for performance reasons. A move west (positive x-axis) is achieved like so:

WEST = WORLD_WIDTH * WORLD_HEIGHT;
NORTH = WORLD_HEIGHT;
SOUTH = -WORLD_HEIGHT;
WEST = WORLD_WIDTH * WORLD_HEIGHT;
EAST = -WORLD_WIDTH * WORLD_HEIGHT;
UP = 1;
DOWN = -1;

indexPointer += WEST;

To prevent the index going out of range and to what I stupidly thought would achieve wrapping I added a bit mask to the result:

WORLD_ARRAY_SIZE = WORLD_WIDTH * WORLD_WIDTH * WORLD_HEIGHT;
WORLD_ARRAY_SIZE_MASK = WORLD_ARRAY_SIZE - 1;
WEST = WORLD_WIDTH * WORLD_HEIGHT;

indexPointer = (indexPointer + WEST) & WORLD_ARRAY_SIZE_MASK;

This works fine on the x-axis above but on the z-axis (north-south) the wrapping as I discovered after giving it some thought is obviously not going to work. Correct me if I'm wrong but I need a way to perform a bit mask on the bits that represent the z coordinate within indexPointer. First 8 bits are the height, next 10 bits are z coord and final 10 bits are x coord. Is there a way to perform this without extracting the 10 bits into a separate field then performing the addition and mask and then having to reconstruct the final pointer ?

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Nick B.
  • 23
  • 4
  • Are you sure that this solves an actual performance problem, compared to e.g. a 3d array? Not saying it definitely doesn't, just interested on the extent this change affects performance considering this is c#. – Rotem Sep 22 '15 at 11:13
  • @Rotem: I think given you implement it as `a[,,]`, it's not that much of a problem. `a[][][]` is another story because of the large amount of cache faults. But still, since a multi-dimensional array does multiplications instead of shifts, it can slow down. I'm however not entirely sure this is the OP's bottleneck. – Willem Van Onsem Sep 22 '15 at 11:21
  • @CommuSoft Yes this is what I'm getting at. I would assume the multiplications would not be that crazy considering c# is doing bounds-checking on each access. – Rotem Sep 22 '15 at 11:23
  • indexPointer is updated/calculated thousands of times in nested loops, even a call to a function is going to have a major impact on performance, hence using the constants to navigate. I could just calculate the pointer directly from the xyz coords(masked with world width) every loop iteration but again I was hoping to be able to get away with a simple addition and logic operation. – Nick B. Sep 22 '15 at 11:34

1 Answers1

1

Given the constants:

HIGH = 1;
NORTH = WORLD_WIDTH;
WEST = NORTH*WORLD_HEIGHT;

you can modify your "increment" function:

public static void UpdatePointer (int indexPointer, int dir) {
    int mask0 = 0xffc0000;
    int mask1 = 0x003ff00;
    int mask2 = 0x00000ff;
    res = (indexPointer&mask0+dir)&mask0;
    res |= (indexPointer&mask1+dir)&mask1;
    res |= (indexPointer&mask2+dir)&mask2;
    return res;
}

This is thus more or less what you need instead of indexPointer += update.


Explanation:

The problem is that when you add a constant, that adding a value can result in carry. You basically want to represent three "integers" into a single integer. Now given the integers have values:

#bits 10   10    8
value 12 1023   15

and you add to the second integer, that one will overflow resulting in:

#bits 10   10    8
value 13    0   15

although the second "integer" has the correct value, the first has incremented.

You can resolve this by masking. Let's take a look at the second mask instruction:

(indexPointer&mask1+dir)&mask1;

(the others are equivalent).

Here we first apply a mask to the indexPointer. This ensures that we "extract" the correct integer. So:

#bits 10   10    8
value 12 1023   15

now becomes:

#bits 10   10    8
value 0  1023    0

This can be useful to prevent the dir from overflowing the third integer and thus causing an increment of the second one.

Now we "add" the dir to the value, resulting in:

#bits 10   10    8
value  1    0    0

but now we mask again resulting in:

#bits 10   10    8
value  0    0    0

So the second mark instruction has determined the value of the second "integer" to be zero, and doesn't have any impact on the other integers.

The same for the first mask, where carry provention is more prominent:

#bits 10   10    8 //initial state
value 12 1023   15

#bits 10   10    8 //masking with mask0
value 12    0    0

#bits 10   10    8 //adding dir
value 12    1    0

#bits 10   10    8 //masking again
value 12    0    0

If you later perform an or operation on the three operations, the final state will be:

#bits 10   10    8 //final result
value 12    0   15

Alternative:

an alternative is to use "overflow" bits. In that case the indexPointer uses 8+1+10+1+10 = 30 bits, with one bit in between for overflow. In that case an update is:

indexPointer += dir;
indexPointer &= 0x3ff7feff;

where 0x3ff7feff is a mask to clear the overflow bits again:

0x3ff7feff = 11 1111 1111   0   11 1111 1111   0   1111 1111

in order to "use" the pointer, you will need a small compact operation:

int realPointer = ((indexPointer&0x3ff00000)>>0x02)|((indexPointer&0x7fe00)>>0x01)|(indexPointer&0xff);

Or the update function reads:

public static void UpdatePointer (int indexPointer, int dir) {
    int temp = ((indexPointer&0xffc0000)<<0x02)|((indexPointer&0x3ff00)<<0x01)|(indexPointer&0xff);
    temp += dir;
    return ((temp&0x3ff00000)>>0x02)|((temp&0x7fe00)>>0x01)|(temp&0xff);
}

evidently you need to pay the price somewhere.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • Thanks for taking the time to answer. What you say is absolutely correct and a potential solution, but as I commented above it would probably be quicker to just recalculate the pointer from the XYZ coordinates again. – Nick B. Sep 22 '15 at 11:37
  • @NickB.: the problem is that it will nearly always be a "correct" pointer. In other words, information is lost after applying the operation. You thus have to do it during the operation. You could [*force inline*](http://stackoverflow.com/questions/12303924/how-to-force-inline-functions-in-c) to reduce the overhead of calling the method. – Willem Van Onsem Sep 22 '15 at 11:41
  • In the table above in your answer should not the final state be 12 0 15 ? Or am I missing something ? – Nick B. Sep 22 '15 at 11:50
  • @NickB.: yes the final result is `12 0 15`. An alternative is to work with "overflow" buffers: bits in between that are the drain for potential overflow. In that case you pay the price when calculating the real indexPointer. – Willem Van Onsem Sep 22 '15 at 11:57
  • Thanks CommuSoft, I think all in all the best thing for me is to use my current solution which is to use the method I described above for the 'inside of the array' which is after all the vast majority of it and when dealing with the 'edges' where wrapping may occur resort to calculating the pointer from the xyz coords. Thanks for your input it has really helped my thinking on the subject. – Nick B. Sep 22 '15 at 12:06
  • I just thought of a rather obvious alternative: keep two pointers, one on z axis and one on x, then just apply appropriate mask after operation and add them, plus y component of course. – Nick B. Sep 22 '15 at 12:20
  • Yes, you can do that as well. The downside is that you will carry out three additions (plus or operations and masking), so that the computational cost will still have an impact. As said before *There is no such thing as a free lunch* :(. The operations proposed here, can be done with a vector containing elements in the *x*, *y* and *z* direction all at once. In other words, aiming to save a few instructions for these. – Willem Van Onsem Sep 22 '15 at 12:22