3

This has caught my interest and attention since yesterday. I am trying to store bits in Java and hit by Memory Overhead.

My first question regarding same is What is size of my Bitset?

Based on the answers I looked at other references and found Memory Usage guide.

Then I looked at BitSet source code which looks like

public class BitSet implements Cloneable, java.io.Serializable {
    /*
     * BitSets are packed into arrays of "words."  Currently a word is
     * a long, which consists of 64 bits, requiring 6 address bits.
     * The choice of word size is determined purely by performance concerns.
     */
    private final static int ADDRESS_BITS_PER_WORD = 6;
    private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
    private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;

    /* Used to shift left or right for a partial word mask */
    private static final long WORD_MASK = 0xffffffffffffffffL;

    /**
     * @serialField bits long[]
     *
     * The bits in this BitSet.  The ith bit is stored in bits[i/64] at
     * bit position i % 64 (where bit position 0 refers to the least
     * significant bit and 63 refers to the most significant bit).
     */
    private static final ObjectStreamField[] serialPersistentFields = {
        new ObjectStreamField("bits", long[].class),
    };

    /**
     * The internal field corresponding to the serialField "bits".
     */
    private long[] words;

    /**
     * The number of words in the logical size of this BitSet.
     */
    private transient int wordsInUse = 0;

    /**
     * Whether the size of "words" is user-specified.  If so, we assume
     * the user knows what he's doing and try harder to preserve it.
     */
    private transient boolean sizeIsSticky = false;

    /* use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = 7997698588986878753L;

    /**
     * Given a bit index, return word index containing it.
     */
    private static int wordIndex(int bitIndex) {
        return bitIndex >> ADDRESS_BITS_PER_WORD;
    }

.....
}

As per the calculation based on Memory Guide, this is what I calculated

8  Bytes: housekeeping space
12 Bytes: 3 ints
8  Bytes: long
12 Bytes: long[]
4  Bytes: transient int // does it count?
1  Byte : transient boolean
3  Bytes: padding

This sums to 45 + 3 bytes (padding to reach multiple of 8)

This means an empty BitSet itself reserves 48 bytes.

But my requirement was to store bits, What am I missing? What are my options here?

Thanks much

UPDATE

My requirement is that I want to store total of 64 bits in two separate fields

class MyClass{
    BitSet timeStamp
    BitSet id
}

and I want to store millions of MyClass objects in memory

Community
  • 1
  • 1
daydreamer
  • 87,243
  • 191
  • 450
  • 722
  • How many bits do you need to store? If millions, then the 48 bytes overhead is negligible vs the actual bits stored. If only a few then you could just use an int or a long or a few longs or a long[]. – assylias Oct 21 '15 at 14:21
  • You forgot to specify how many bits you want to store. The cheapest way to store "bits" is inside a primitive type, e.g. int = 32 bits. What you perceive as a lot of overhead in BitSet doesn't amount to much if you need to store many bits (thousands or more). – Durandal Oct 21 '15 at 14:21
  • Just update the requirement. thanks for mentioning it – daydreamer Oct 21 '15 at 14:23
  • Yes BitSet begins to become interesting with say 256 bits. If you can live with 64 bits, use long. And BitSet offers easy operations of course. – Joop Eggen Oct 21 '15 at 14:23
  • Do you want to pack all the MyClasses together or keep them like this? – harold Oct 21 '15 at 14:25
  • @harold, can you please explain? My idea is that I should be able to `sort/order` them as well based on `timsStamp`, so I will put them in a `Collection` – daydreamer Oct 21 '15 at 14:27
  • `static`-fields aren't instance-bound. This means they don't take away any space in the object at all. –  Oct 21 '15 at 14:29
  • I don't know why you're counting the static fields as overhead. – Paul Kienitz Oct 21 '15 at 14:32
  • Why model *simple* data as bits? Your requirement looks like its best served by modeling the data as primitive fields, e.g. two simple int's. – Durandal Oct 21 '15 at 14:32
  • @Durandal, I want to save `System.currentTimeInMillis` which gives `long`. Is it safe to case this `long` into `int`? – daydreamer Oct 21 '15 at 14:35
  • Note that Java 10 might get support for primitives as generic type parameters and "value types". Then it might be possible to actually implement Set as a type that doesn't need so much overhead. It could be a value type and just use 64 bits like a "long". Those would be exactly the kind of language features you need. – Claude Martin Oct 21 '15 at 14:42
  • Oh, so you meant *two times 64 bits*? I took your edit as "64 bits total". But my suggestion still stands. Model them as primitives, just use long where needed. – Durandal Oct 21 '15 at 14:46
  • @daydreamer like put them all in a packed array, with no object wrappers. You could still sort them, but it would be trickier, so probably not a great fit in this case. – harold Oct 21 '15 at 14:49

3 Answers3

4

My requirement is that I want to store total of 64 bits in two separate fields

So just use long (64 bit integer). And just use that as a bit field. I once needed something like that, but 32 bit was enough for me, so wrote a little library class to use an int as a bit set: https://github.com/claudemartin/smallset

Feel free to fork it and just replace int by long, 32 by 64, 1 by 1L etc.

Claude Martin
  • 745
  • 6
  • 21
3

This sums to 45 + 3 bytes (padding to reach multiple of 8) This means an empty BitSet itself reserves 48 bytes.

First of all I want to advise the right tool to analyze object layout schemes in JVMs - JOL. In your case(java -jar jol-cli/target/jol-cli.jar internals java.util.BitSet) JOL produces the following result:

Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

java.util.BitSet object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0     4         (object header)                01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4         (object header)                00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4         (object header)                f4 df 9f e0 (11110100 11011111 10011111 11100000) (-526393356)
     12     4     int BitSet.wordsInUse              0
     16     1 boolean BitSet.sizeIsSticky            false
     17     3         (alignment/padding gap)        N/A
     20     4  long[] BitSet.words                   [0]
Instance size: 24 bytes (reported by Instrumentation API)
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total

Your calculations were not correct because of static fields, thus an empty BitSet itself reserves 24 bytes. Please note that these calculations are not 100% exact because it was not taken into account size of long[] object. So the right results are java -jar jol-cli/target/jol-cli.jar externals java.util.BitSet:

Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

java.util.BitSet@6b25f76bd object externals:
          ADDRESS       SIZE TYPE             PATH                           VALUE
        7ae321a48         24 java.util.BitSet                                (object)
        7ae321a60         24 [J               .words                         [0]

This means an empty BitSet itself uses 48 bytes including long array. In order to optimize memory footprint you can write your own implementation of BitSet. For example in your use case following options are available:

public class MyOwnBitSet {
    long word1;
    long word2;
}

public class MyOwnBitSet2 {
    long[] word = new long[2];
}

public class MyOwnBitSet3 {
    int index;
}

JOL produces the following result:

MyOwnBitSet@443b7951d object externals:
          ADDRESS       SIZE TYPE                                                   PATH                           VALUE
        76ea4c7f8         32 MyOwnBitSet                                (object)


MyOwnBitSet2@69663380d object externals:
          ADDRESS       SIZE TYPE                                                    PATH                           VALUE
        76ea53800         16 MyOwnBitSet2                                (object)
        76ea53810         32 [J                                                      .word                          [0, 0]


MyOwnBitSet3@5a2e4553d object externals:
          ADDRESS       SIZE TYPE                                                    PATH                           VALUE
        76ea5c070         16 MyOwnBitSet3                                (object)

Let me explain the last example MyOwnBitSet3. In order to decrease memory footprint you can preallocate a huge array of long/int objects and store only pointer on the right cell. With a sufficiently large number of objects, this option is the most advantageous.

Ivan Mamontov
  • 2,874
  • 1
  • 19
  • 30
  • great insight @Ivan, Thanks a lot. `MemOwnBitSet3` does not suit my requirements because I need to generate `ID (timestamp, uniqueId)` continuously, thile `timestamp` is something I can get from `System.nanoTime`, `uniqueId` will be for uniquely identifying the machine. – daydreamer Oct 22 '15 at 13:08
0

To store a total of 64-bits in an object you can do

class MyClass{
    int timeStamp
    int id
}

or if you don't want to overhead of an object you can do

long timeStampAndId;

The problem is how to encapsulate your operations. For a primitive. Java doesn't help much but what you can do is

enum TimeStampAndId {
    /* no instances */ ;
    public static boolean isTimeStampSet(long timeStampAndId, int n) { ... }
    public static boolean isIdSet(long timeStampAndId, int n) { ... }

i.e. use a utility class to support the primitive type.

In the future Java will support Value Types which won't have an object overhead.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130