1

I've got a piece of code that takes into account a given amount of features, where each feature is Boolean. I'm looking for the most efficient way to store a set of such features. My initial thought was to try and store these as a BitSet. But then, I realized that this implementation is meant to be used to store numbers in bit format rather than manipulate each bit, which is something I'd like to do (see the effect of switching any feature on and off). I then thought of using a Boolean array, but apparently the JVM uses much more memory for each Boolean element than the one bit it actually needs.

I'm therefore left with the question: What is the most efficient way to store a set of bits that I'd like to treat as independent bits rather than the building blocks of some number?

Community
  • 1
  • 1
shakedzy
  • 2,853
  • 5
  • 32
  • 62
  • This question is not about Java. – Joe C Mar 12 '17 at 09:26
  • how many features are we talking about that makes you worry about memory usage? – Lodewijk Bogaards Mar 12 '17 at 16:30
  • 2
    Also, I don't understand your remark "my initial thought was to try and store these as a BitSet, but then realized that this implementation is meant to be used to store numbers in bit format rather than manipulate each bit ". That is what BitSets are for: manipulating large numbers of bits in an efficient way. Individual or not. Why would BitSets be wrong for your use-case? – Lodewijk Bogaards Mar 12 '17 at 16:41
  • @LodewijkBogaards how do I manipulate bits X and Y when I have a loot of bits without calculating the number that will be represented by this bitset? – shakedzy Mar 13 '17 at 08:21
  • That article is outdated and the second half is spam. Still, I don’t get your conclusion that you can’t manipulate single bits. The non-spam part of that article *is* about manipulating bits. – Holger Mar 13 '17 at 15:27

1 Answers1

2

Please refer to this question: boolean[] vs. BitSet: Which is more efficient?

According to the answer of Peter Lawrey, boolean[] (not Boolean[]) is your way to go since its values can be manipulated and it takes only one byte of memory per bit to store. Consider that there is no way for a JVM application to store one bit in only one bit of memory and let it be directly (array-like) manipulated because it needs a pointer to find the address of the bit and the smallest addressable unit is a byte.

The site you referenced already states that the mutable BitSet is the same as the java.util.BitSet. There is nothing you can do in Java that you can't do in Scala. But since you are using Scala, you probably want a safe implementation which is probably meant to be even multithreaded. Mutable datatypes are not suitable for that. Therefore, I would simply use an immutable BitSet and accept the memory cost.

However, BitSets have their limits (deriving from the maximum number of int). If you need larger data sizes, you may use LongBitSets, which are basically Map<Long, BitSet>. If you need even more space, you may nest them in another map Map<Long, LongBitSet>, but in that case you need to use two or more identifiers (longs).

Community
  • 1
  • 1
matfax
  • 634
  • 11
  • 17
  • 2
    boolean[] is the most efficient from a performance point of view. BitSet is more efficient for memory usage for larger sets of data. – Peter Lawrey Mar 12 '17 at 20:35
  • Is `Boolean` the default Scala Boolean? What is `boolean` than? – shakedzy Mar 13 '17 at 08:29
  • Boolean wraps boolean and offers more methods. Check this: http://stackoverflow.com/questions/3728616/boolean-vs-boolean-in-java – matfax Mar 13 '17 at 12:50
  • According to http://www.scala-lang.org/api/2.12.x/scala/Boolean.html, the Scala Boolean wraps the primitive Java boolean. This might also be interesting for you: http://stackoverflow.com/questions/13084290/how-can-i-use-primitives-in-scala – matfax Mar 13 '17 at 13:05