2

To preface this - I have no influence on the design of this problem and I can't really give a lot of details about the technical background.

Say I have a lot of components of the same type that regularly get a boolean event - and I need to hold a short history of these boolean events.

A coworker of mine wrote a rather naive implementation using the type Map<Component, CircularFifoQueue<Boolean>>, CircularFifoQueue being data structure from Apache Commons. The code works, but given how generics work in Java and the dimensions used, this is really inefficient as it stores a reference to one of the two singleton boolean objects instead of just one bit.

Generally there are around 100K components and the history is supposed to hold the 5-10 most recent boolean values (might be subject to change but probably won't be larger than 10). This currently means that around 1.5GB of RAM are allocated just for these history maps. Also these changes happen quite frequently so it wouldn't hurt to increase the CPU efficiency if possible.

One obvious change would be to move the history into the Component class to remove the HashMap-induced overhead.

The more complicated question is how to efficiently store the last few boolean values. One possible way would be to use BitSets, but as those use long[] as their underlying data structure, I doubt it would be the most efficient way to store what is essentially 5 bits.

Another option would be to directly use an integer and shift the value as a way to remove old entries. So basically

int history = 0;
public void set(int length, boolean active){
   if(active) {
      history |= 1 << length;
   } else {
      history &= ~(1 << length);
   }
   // shift one to the right to remove oldest entry
   history = history >> 1;
}

Just off the top of my head. This code is untested. I don't know how efficient or if it works, but that is about what I had in mind.

But that would still lead to quite some overhead compared to the optimal case of storing 5 bits of data using 5 bits of memory.

One could achieve some additional saving if the histories of the different components were stored in a contiguous array, but I'm not sure how to handle either one giant contiguous BitSet. Or alternatively a large byte[] where each byte represents one bool-history as explained above.

This is a weirdly specific problem and I'd be really glad about any suggestions.

dohamann
  • 86
  • 7
  • 2
    If you're only dealing with 5 bits you can use a `byte` instead of an `int`. You can then `history &= 0b11111` to clear older entries. However, at this point any optimization is minimal and probably won't affect your performance and/or memory footprint. – Benjamin Urquhart Aug 15 '19 at 17:40
  • Can you give an index to each `Component`? Why are you storing them in a `Map`? – fps Aug 15 '19 at 17:49
  • @BenjaminUrquhart Thanks for the suggestion but AFAIK (and please correct me if I'm wrong) that actually doesn't make a difference in Java. Due to memory alignment the byte member (or even a boolean for that matter) will be stored using 32 bit anyway. – dohamann Aug 15 '19 at 18:22
  • @FedericoPeraltaSchaffner As I said I personally wouldn't store them in a map, that's just the way it is right now. The components are actually already indexed. I think my best bet right now is to use a byte array where each byte represents the history of a component. – dohamann Aug 15 '19 at 18:25
  • If you already have all the components indexed (or can achieve it somehow), I'd go with a big `BitSet` to store all the components, i.e. I'd use 10 bits to represent each component's history – fps Aug 15 '19 at 18:39

1 Answers1

2

Setting aside the bit manipulations which I'm sure you'll conquer, please think how efficient is efficient enough.

Every instance of

class Foo {}

allocates 16 bytes. So if you were to introduce

class ComponentHistory {
  private final int bits;
}

that's 20 bytes.

If you replace the int with byte, you're still at 20 bytes: byte type is padded to 4 bytes by JVM (at least).

If you define a global array of bits somewhere and refer to it from ComponentHistory, the reference itself is at least 4 bytes.

Basically, you can't win :)


But consider this: if you go with the simplest approach that you have already outlined, that produces simple readable code, your 100K component histories will take up 2MB of RAM - substantial savings from your current level of 1.5GB. Specifically, you've saved 1498MB.

Suppose you indeed invent a cumbersome yet working way of only storing 5 bits per history. You'd then need 500Kb = 60KB to store all histories. With the baseline of 1.5GB, your savings are now 1499.94MB. Savings improve by 0.1%. Does that at all matter? More often than not, I'd prefer to not over-optimize here while sacrificing simplicity.

iluxa
  • 6,941
  • 18
  • 36