5

I would like to know what is the memory usage of BitSet in Scala.For example, if I do:

  var bitArray:BitSet=new BitSet(10)
  bitArray.add(0)
  bitArray.add(2)
  bitArray.add(4)
  bitArray.add(6)
  bitArray.add(8)

How does that compare with and array containing the even numbers 0,2,4,6,8?

What about writing a number in binary:

  var bitArray:BitSet=new BitSet(32)
  bitArray.add(5)
  bitArray.add(3)
  bitArray.add(2)
  bitArray.add(1)
  bitArray.add(0)

How does that compare to the number 47?

I'm asking here of memory usage. But as a more open question, if you know, what are the advantages/disadvantages or uses of BitSet (WR to other common data types).

Thanks,

Skuge
  • 1,010
  • 2
  • 11
  • 28
  • possible duplicate of [Boolean\[\] vs. BitSet: Which is more efficient?](http://stackoverflow.com/questions/605226/boolean-vs-bitset-which-is-more-efficient) – Thomas Jung Jun 29 '10 at 13:29
  • 3
    Perhaps you should give us a higher-level statement of the problem you're trying to solve instead of three variant questions about very low-level data structure properties. – Randall Schulz Jun 29 '10 at 13:49
  • Thanks Thomas, that post gave me more insight about BitSet. I still want to know if I can gain space by representing other structures by a BitSet. I guess everithing will be clearer if someone can clarify how BitSet is implemented. Thanks, – Skuge Jun 29 '10 at 14:13
  • Thomas: The context here is Scala, which provides its own `BitSet`, though an `Array[Boolean]` is the same thing as `Boolean[]` in Java. Closely related, but not exact duplicate. – Daniel C. Sobral Jun 29 '10 at 14:14
  • @Daniel It's to similar to be interesting. If it would support compressed bitsets ... but both implementations have the same idea you can read the Java or Scala lib codes it is the same. – Thomas Jung Jun 29 '10 at 16:13

1 Answers1

16

You can look at the implementation of BitSet in Scala 2.8 here: scala.collection.mutable.BitSet.

It is implemented based on an array of Longs. The size of the array depends only on the highest number stored in it. Divide the highest number stored in it by 64, rounding up, and you have the size of the array. Each element in the array consumes 8 bytes.

That means that dividing by 8 the greatest number stored in it, roughly yields the size in bytes of the BitSet. "Roughly" because of virtual machine memory management overheads, because the pointer to the array also needs some memory and because the array itself has some overhead.

The order of insertion or the actual number of elements stored in the BitSet have no influence on the size of the memory allocated.

For the two examples you give, only one Long-element is required to store the numbers, using 8 bytes of memory, as in both examples the highest number is less than 64.

An array of Ints, storing any five numbers, would consume 5 * 4 bytes = 20 bytes plus overhead. For storing n numbers you need roughly n * 4 bytes.

So you are comparing (highestNumberStored / 8) bytes against (countOfNumbersStored * 4) bytes.

Ruediger Keller
  • 3,024
  • 2
  • 20
  • 17