I am looking for an alternative to Java Bitset implementation. I am implementing a high performance algorithm and seems like using a Bitset object is killing its performance. Any ideas?
-
5Could you give us more specifics about what operations of `BitSet` appear to *kill the performance*? A short snippet of code that you profiled to show its slowness would be ideal. – Sergey Kalinichenko Jan 10 '12 at 14:03
-
Your question should rather be "why is this bitset killing my performance?" --and notice that I am already giving you some credit by not suggesting that it should be "what is killing my performance here?" – Mike Nakis Jan 10 '12 at 14:05
-
Well, an "alternative" might be doing bit operations on primitives (long, int etc.) yourself. However, as already stated you should elaborate on your goals and the exact performance problem. – Thomas Jan 10 '12 at 14:06
-
I would consider the whole problem and try to remove the need to create a BitSet at all. To do that I would need a broader understand of the problem you are trying to solve. – Peter Lawrey Jan 10 '12 at 14:19
5 Answers
Someone here has compared boolean[]
to BitSet
and concluded with:
BitSet
is more memory efficient thanboolean[]
except for very small sizes. Eachboolean
in the array takes a byte. The numbers fromruntime.freeMemory()
are a bit muddled forBitSet
, but less.
boolean[]
is more CPU efficient except for very large sizes, where they are about even. E.g., for size 1 millionboolean[]
is about four times faster (e.g. 6ms vs 27ms), for ten and a hundred million they are about even.
If you Google, you can find some alternative implementations as well, like JavaEWAH, used by Apache Hive, Apache Spark and Eclipse JGit. It claims:
The goal of word-aligned compression is not to achieve the best compression, but rather to improve query processing time. Hence, we try to save CPU cycles, maybe at the expense of storage. However, the EWAH scheme we implemented is always more efficient storage-wise than an uncompressed bitmap as implemented in the BitSet class). Unlike some alternatives, javaewah does not rely on a patented scheme.
While searching an answer for my question single byte comparison vs multiple boolean comparison, I found OpenBitSet
They claim to be faster than Java BitSet and direct access to the array of words storing the bits.
I am definitely gonna try that. See if it solve your purpose too.

- 1
- 1

- 7,193
- 12
- 64
- 90
-
1It should be worth mentioning, that OpenBitSet doesn't exist anymore since Lucene 5.0.0. Now, we have to use [LongBitSet](https://lucene.apache.org/core/5_2_0/core/org/apache/lucene/util/LongBitSet.html) which doesn't have the ability to grow automatically anymore. – ThexXTURBOXx Feb 23 '21 at 08:26
Look at Javolution FastBitSet : A high-performance bitset integrated with the collection framework as a set of indices and obeying the collection semantic for methods such as FastSet.size() (cardinality) or FastCollection.equals(java.lang.Object) (same set of indices).
See also http://code.google.com/p/guava-libraries/issues/detail?id=724#c3.

- 7,193
- 12
- 64
- 90

- 24,954
- 11
- 143
- 151
There are a number of compressed alternatives to the BitSet class. EWAH was already mentioned (https://github.com/lemire/javaewah). More recent additions include Roaring bitmaps (https://github.com/RoaringBitmap/RoaringBitmap) that are used by Apache Lucene, Apache Spark, Elastic Search, and so forth.

- 3,470
- 2
- 25
- 23
If you really must squeeze the maximum performance out of this thing, and if memory does not matter, you can try storing each one of your flags in an integer whose bit size is equal to the width of the data bus of your CPU.
You are probably on a 64-bit data bus CPU, so try long integers.

- 56,297
- 11
- 110
- 142
-
-
Because if alignment counts on your architecture, then you want to go with the exact size of the data bus, no more, no less. And modern architectures usually have 64-bit address buses, not 32-bit. I am not saying that this will necessarily work, so be sure to benchmark it. It depends on how your CPU accesses your memory. – Mike Nakis Jan 10 '12 at 19:21