Memory efficient AI objects for physics game

Question

I am creating a physics game in java using box2d.

I am writing an AI class and want to make sure my data is being stored as efficiently as possible, taking into consideration memory alignment.

The tiniest increase will likely make a huge difference because I am literally running 'as many AI objects as I can' until the system slows down.The program is already using a lot of memory on the collision detection, because once again, I want to be able to support as many agents as possible.

What I understand so far, is that the smallest Java type is 8bytes, and that objects are padded into multiples of 8. I have structured my AI controls in boolean arrays, representing movement: x+/-1, y+/-1, and clockwise/CCW rotations for certain fixtures.

Since Java doesn't have null settings for Booleans, I nested the controls in command objects with the bool values on_off, and pos_neg. With the movements and rotations, I'm dealing with about 7 command objects per one 'default' action, (such as move right). So I create Command arrays for each action.

My question is: Am I doing this efficiently?

I haven't finalized the design, so I'm not certain about the size of each array. But, given memory-alignment requirements, I'm guessing I will have at least some padding, which is ultimately wasted memory.I'm considering doing something like cutting the object size to fit the padding limit and then pushing the remaining data from multiple objects into an 'overflow' object... or something like that.

Will this speed things up? Why or why not?

I'm also considering using bitsets, though I think my command objects may have achieved a similar result, and I've been told bit-shifting is slow.

public class Command {

        boolean on_off  = false;
        boolean pos_neg = false;
}

public class DefaultMoves {

    //Note: these arrays are 2d in the event multiples are necessary
    //to complete a single action, I may change this later.
    Command[][] mvRight =    
        { 
              {     new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(true, true), //moveX
                    new Command(false, false)  //   
              },   
        };
    Command[][] mvLeft =    
        { 
              {     new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(true, false), //moveX
                    new Command(false, false)  //   
              },   
        };
}

To improve the clarity of your description (and more effectively attract answers), please include some code snippets in your question to illustrate your points. — EkcenierK, Nov 20 '15 at 15:09

Mats Petersson · Accepted Answer · 2015-11-23T21:38:33.033

3

This was going to be just a comment, but got a bit lengthy, and I didn't feel like writing it as 3 comments.

Since this is a follow-up question of another, I would start with "Stop worrying about padding". Worry about how you store YOUR data.

And, if you are going to worry about how much space your things take, allocate an array of 7 objects instead of 7 individual objects. I'm sure Java has overhead in each allocation. In a typical C or C++ implementation, each allocation with new or malloc takes up 16-32 bytes above and beyond the size of the actual data allocated, and the size is rounded to 16 or 32 bytes. In java there is a suggestion here that the memory overhead of an object is 8 bytes - this may not be true for ALL Java VMs and implementations.

Further, all space&time optimisation is a compromise between space and time [nearly always at least], so storing your data in a more compact form will cost in time for the saving in space. I can for example think that having pairs of on_off and pos_neg as two bits in a larger integer structuer. So your 7 commands would be stored in one integer. However, now you have to do shifts and masking to get the right data out. Likewise, shifting and oring if you are going to store something. (I'm writing this as C, since I don't know Java well enough).

/* This is big enough for 16 pairs of on_off and pos_neg */
/* In a pair of bits, bit 0 = on_off, bit 1 = pos_neg */
uint32_t cmdData;

/* Get values of on_off and pos_neg for element number n */
void getData(int n, bool& on_off, bool& pos_neg)
{
    uint32_t shifted = cmdData >> (2 * n);
    on_off = (shifted & 1) != 0;
    pos_neg = (shifted & 2) != 0;
}

/* Store values for element n */
void setData(int n, bool on_off, bool pos_neg)
{
    uint32_t bits = (int)on_off + (2 * (int)pos_neg); 
    uint32_t mask = 3 << (n * 2);
    cmdData &= ~mask; /* Clear bits */
    cmdData |= bits << (n * 2);
}

As you can see, this stores the data more efficiently, as we can store 16 pairs of {on_off, pos_neg} in 4 bytes, rather than each (probably) taking up a byte each. But to get to each, you have to do a few extra operations each time (and the code gets messier). Whether this is "worth having" or not highly depends on the situation, how often you access these versus how low on memory the system is (assuming the memory usage for THESE objects is what is causing the problem - if you have 100 command structures and 40000000 objects that USE the commands, then the commands aren't going to be the issue).

The way I would store the commands for direction/movement is probably as two integer values (int8_t [byte in java] if space is tight), holding a +1 for moving for example right or down, -1 for moving left or up. This is not the most compact form, but it makes for easy access and easy calculation of a new position.

This pair can then be used to describe all possible directions:

struct direction
{
     int x, y;
};

direction directions[] =
{
     { 0, 0 },    // Don't move.
     { 0, 1 },    // Right.
     { 0, -1 },   // Left.
     { 1, 0 },    // Down.
     { -1, 0 },    // Up.
 };

If you want to move diagonally too, you'll have to add another four pairs with the combinations of { 1, 1 }, {-1, 1}, etc.

The same can be applied to an object that can move, as a pair of xDir, yDir values.

But the key here is that you want to first of all come up with a good understanding of what is more important: space or computation. From that, find out what objects are taking up most of the space (what has the highest count). Fiddling with the size of objects that you have one or a few dozen of won't make any big different, something you have millions of will). If space is not an issue (and to be fair, it's really hard to write code that efficiently uses large enough amounts of data in a system with gigabytes of RAM - it is usually the CPU that runs out of speed before memory is exhausted if you do something to each object for every frame).

Suitcase analogy:

Imagine that you have a suitcase that can hold exactly 4 little boxes in it's width [and any number at the length - it's a weird suitcase!], and you have larger boxes that are 1, 2, 3 or 4 units of little boxes. The boxes are made with "velcro", so they stick together and can be split as you like, but you have to keep track of which belong together, and each time you "split" or "put back together" the units, it takes extra time.

If you want to be lazy and make it simple, you just stick your three-box ones into the suitcase with one empty space next to each.

 1 2 3 4
 a a a x
 b b b x
 c c c x
 d d d x

and so on.

If you want to pack it tightly, you take one 3 unit box, and then cut one unit of the next one, and stick it next to the first one, then the remnant two units on the next space, then cut a two unit piece from the next one, and stick it next to packet 2, and so on.

1 2 3 4
a a a b
b b c c
c d d d

Now, you have used 25% less space to store them, but you spent time split them, and you have to spend time again to fetch them out into units of three when you later need to use the data.

Now, imagine that you get paid to put things into the suitcase, and you get paid per item you put, which method do you choose?

Then consider that you instead of being paid per item, you have to pay for suitcase space (because you are the owner of the company). Now you want to squeeze as much into the space as possible. But it takes extra time, right? If suitcase is expensive, then it may be worth it. If it's not so expensive, you may prefer to save time over space. It's a compromise.

[We could do the same thing in more realistic units of 8, 32 or 64, but I think that would just make it harder to read and definitely make it harder to type]

edited Nov 23 '15 at 21:38

answered Nov 21 '15 at 08:28

Mats Petersson

126,704
14
140
227

Fair enough, I guess I'll have to start looking in to CPU efficient practices. Though some say Java is best went you don't try to manually optimise. – bigcodeszzer Nov 22 '15 at 00:17
I think you misunderstood: You need to determine if your problem comes from "using a lot of memory" or if it comes from "using up all the CPU-time". And most languages that have any form of optimisation will behave best for "natural code" - in other words, the compiler/JIT will be more likely to recognize the normal version of something it can be clever about than some "very clever" solution. – Mats Petersson Nov 22 '15 at 09:31
True. But I'm still baffled that compilers don't fill in padding. – bigcodeszzer Nov 22 '15 at 19:33
Fill with what? The whole point of padding is to improve performance by aligning data in a way that makes it easily accessible (and sometimes also "not crash", depending on the system, but modern systems typically do accept unaligned data, just slower than aligned). – Mats Petersson Nov 22 '15 at 22:29
Right, but shouldn't the padding contain actual program data? Doesn't inserting empty padding seem inefficient? – bigcodeszzer Nov 23 '15 at 20:20
As given in my example in the C++ question, adding 3 bytes of data to a structure will give a 7.5% increase in speed. Speed vs. size is very often a compromise. You can't put useful data inside the padding without costing time in some way or another. See my example above, shifting two bits in and out of a larger data type - it will cost time, but save space (being 4-8 times better than storing it in 1 or 2 bytes, but most likely 3-5 times slower - it's an engineering decision which is more important, space or time) – Mats Petersson Nov 23 '15 at 21:03
You can think of it like packing a suitcase with boxes. If you have to "play tetris" to make it fit, you will need more time to perfectly squeeze everything in, compared to having a much bigger suitcase where the boxes fit easily without packing them tightly. – Mats Petersson Nov 23 '15 at 21:04
I think I understand what you mean. What I wonder though, is could the 'suitcase tetris' be done effectively in oop paradigms. Suffice to say, in an entire program, there should be members of close to every size, so why aren't they all arranged so that no padding is necessary, with everything aligned naturally? – bigcodeszzer Nov 23 '15 at 21:13
I realize this would muck up program structure, but shouldn't be a compiler-side operation? – bigcodeszzer Nov 23 '15 at 21:14
Instead of padding 100 objects with a few bits, re-distributing all of the objects so the fit exactly, and then just have one object leftover that needs to be padded if the numbers don't divide perfectly. – bigcodeszzer Nov 23 '15 at 21:17
Ok. I see what you mean, yes. With bitsets the shifts are costly but in the case of objects, does it take more time to cut and paste members? – bigcodeszzer Nov 23 '15 at 21:49
Depends on the details, but in the typical case. Like I said, I posted an example where there is 7.5% difference between packed and padded data - I know compiler engineers that are happy to get 1% improvement on a bit of code by optimising some code generated by the compiler. 7.5% improvement is "let's go to the pub to celebrate" type improvement, unless you knew the code was bad in the first place. – Mats Petersson Nov 23 '15 at 21:52
If it was absolutely free and no cost, the compiler [or JIT, or whatever] wouldn't have code to align data within a structure/object. After all, that is extra work in the compiler, and compiler writers tend to be rather strict about doing stuff that serves no purpose. – Mats Petersson Nov 23 '15 at 21:55
Actually wait a minute. If the system takes more time to define an object that handles its own padding, that will slow down its creation. But if they aren't being created repetitively, that would only be one point of lag for all of the memory saved over the entire life cycle? – bigcodeszzer Nov 23 '15 at 21:59
As for read/write time.... I definitely don't know about that in terms of objects, but I would think that there wouldn't be much difference? – bigcodeszzer Nov 23 '15 at 22:00
No, it takes more time during compilation to figure out the suitable padding, to make it fast when you execute the code. If it wasn't necessary, the compiler would just put all things next to each other without padding, with no extra effort! – Mats Petersson Nov 23 '15 at 22:05
Yes but if you have to initialize 1000 objects and you manually deal with the padding, you will have that -7% because of the time it takes to 'arrange the suitcase'. But, if the objects were just created once, that would be a -7% followed by the continuous increase in speed from saved RAM, no? – bigcodeszzer Nov 23 '15 at 22:28
It might be longer to initialize them, but that could be worth it, no? – bigcodeszzer Nov 23 '15 at 22:29
The padding is dealt with when laying out the structure during compilation, which happens once for each type (obect/struct/etc), not 1000 times for an array of 1000 elements. If you do it right the first time, (in other words add an `x` between each `aaa` and `bbb` etc), then it's easy to access each of them. If you don't add padding, then the processor or the code has to do extra wok to separate and put back together the right parts. So there is an overhead when compiling to add padding, and a benefit when running the code. Since code typically runs more than it gets compiled, it's a win. – Mats Petersson Nov 23 '15 at 22:32
I was going to write the code, but it will take too long. So lets just say hypothetically you have an object that excepts dynamic data. If the data exceeds the padding limit (which would be x/y %q) or something like that, than the remaining data, lets say char, gets stored in an object of another class called 'overflow'. A pointer is created to record where the remaining data has been stored. Every time 5 objects are initialized, one overflow object is created with lets say space of 64. When you want to read or write to an overflow value, the object implements logic to use the pointer. – bigcodeszzer Nov 23 '15 at 23:12
The result is: All of the objects are perfectly aligned in memory, no padding necessary, except for the last overflow object created, which will have a small about padding. All that RAM is saved. What I don't understand, is how this would be slower? – bigcodeszzer Nov 23 '15 at 23:14
But you have to pack and unpack every object before you can use it, which is going to add two instructions (possibly more) for every item of data. Which most likely makes it three times slower than padding - and for compilers, speed over space nearly always wins. – Mats Petersson Nov 23 '15 at 23:15
Not sure what you mean by pack and unpack? – bigcodeszzer Nov 23 '15 at 23:16
In my example, it's 7.5% slower to NOT pack the data. If you don't understand why, I'd accept that it's hard to accept, but the FACT is there, right? – Mats Petersson Nov 23 '15 at 23:16
I thought you said it was slower because of bitshifting? – bigcodeszzer Nov 23 '15 at 23:17
To split `aaab` and `bbcc` into `aaa` and `bbb` is unpacking, to combine `aaa`, `bbb` into `aaab` and `bbcc` (or `bbxx` is packing). That is, shuffling their "no padding" places. – Mats Petersson Nov 23 '15 at 23:17
And it has to be unpacked every time you access it? – bigcodeszzer Nov 23 '15 at 23:19
If you go for extreme packing, yes, bitshifts, ands and ors are involved. In my example in the other thread, the reason for 7.5% speedup with aligned data is simply that the structure is aligned such that the processor can access each element directly, rather than "separate stuff" (think taking apart and putting together velcro boxes). But the key is that "padding is there because it makes the code faster". To NOT have padding will make the code slower. Padding is there as a compromise between data space usage and CPU usage in time. – Mats Petersson Nov 23 '15 at 23:22
Yes, packing every time you write, unpacking every time you read. – Mats Petersson Nov 23 '15 at 23:22
Does that mean linked lists are really slow then? – bigcodeszzer Nov 23 '15 at 23:28
Compared to? And how organised? If elements are aligned correctly (have the right padding), the packing and unpacking is free. It's only when things aren't aligned properly that it gets messy. That's the reason for the padding (`x` in my little suitcase example). – Mats Petersson Nov 23 '15 at 23:33

Memory efficient AI objects for physics game

1 Answers1

Linked