5

In Java 8 the below code converts integer 3 to bitset and prints {0, 1} meaning the bit representation of 3 has ones at positions 0 and 1 that is 11.

System.out.println(BitSet.valueOf(new long[]{3}));

I'm interested in converting a BigInteger or long with large value, say 10000000 into the corresponding BitSet and then back - having the BitSet object (bit representation) convert it to BigInteger (long). I'm also wondering which representation occupies more memory for various values of integer numbers?

Stephan Rozinsky
  • 553
  • 2
  • 6
  • 21
  • an additional remark: does endianness matters, because that's an important aspect. `3` is now represented as `{0,1}`. So the `i`-th bit takes `2^i` as value. – Willem Van Onsem Apr 08 '15 at 23:43

2 Answers2

4

As BigInteger is big-endian, while BitSet is little-endian, the bytes will be reversed when trying to convert directly from one to the other via toByteArray(). The simplest solution would be to just reverse them again:

/**
 * Converts from BigInteger to BitSet. The BigInteger must be non-negative,
 * otherwise an IllegalArgumentException is thrown.
 */
public static BitSet toBitSet(BigInteger val) {
    if(val.signum() < 0)
        throw new IllegalArgumentException("Negative value: " + val);
    return BitSet.valueOf(reverse(val.toByteArray()));
}
/**
 * Converts from BitSet to BigInteger.
 */
public static BigInteger toBigInteger(BitSet val) {
    return new BigInteger(1, reverse(val.toByteArray()));
}
/**
 * Reverses and returns the specified byte array.
 */
private static byte[] reverse(byte[] bytes) {
    for(int i = 0; i < bytes.length/2; i++) {
        byte temp = bytes[i];
        bytes[i] = bytes[bytes.length-i-1];
        bytes[bytes.length-i-1] = temp;
    }
    return bytes;
}

By the way, if you are only interested in the bit indices, you may use BigInteger#testBit instead.

Lone nebula
  • 4,768
  • 2
  • 16
  • 16
  • this reverses only the order of the bytes (groups of 8 bits). The order of the bits must be reversed as well. And all this requires the exact length of the wanted result, to accommodate any trailing zeros – Radu Simionescu Dec 23 '15 at 23:07
  • 1
    @Radu Simionescu: This is wrong. BitSet uses the LITTLE_ENDIAN bit order (in contrast to the BIG_ENDIAN byte order if loaded from a byte[]) – Heri Apr 02 '17 at 18:08
  • the reverse is necessary when manipulating bits between bigInteger and bitSet, upvote for this answer. – coldestlin Feb 22 '21 at 03:19
2

You can use the BigInteger.toByteArray() and BitSet.toByteArray() for these:

BigInteger bi = new BigInteger("31415926535");
bi = bi.multiply(new BigInteger("271828"));
System.out.println(bi);
BitSet bs = BitSet.valueOf(bi.toByteArray());
System.out.println(bs);
BigInteger bi2 = new BigInteger(bs.toByteArray());
System.out.println(bi2);

You can use the constructor of the BigInteger(byte[]) to convert the array of bytes into a BigInteger and the BitSet.valueOf(byte[]) to convert the data into the desired values.

For the given code, it outputs:

8539728478155980
{1, 2, 3, 4, 9, 10, 12, 14, 17, 18, 20, 22, 23, 25, 27, 28, 29, 30, 32, 33, 35, 37, 38, 42, 44, 45, 50, 51, 54, 55}
8539728478155980

Note that the 2-complement notation is used. Thus for negative numbers, it will generate additional ones. And a 2^64-1 will require more than 64 bits to represent. This also works with big endian. (see modified answer below).

EDIT: there is however one problem with this approach: if there are tailing zeros, these are not taken into account by the program. This can be important because they are part of the representation. You can solve this problem by adding a tailing bit:

public static BitSet convertTo (BigInteger bi) {
    byte[] bia = bi.toByteArray();
    int l = bia.length;
    byte[] bsa = new byte[l+1];
    System.arraycopy(bia,0,bsa,0,l);
    bsa[l] = 0x01;
    return BitSet.valueOf(bsa);
}

public static BigInteger convertFrom (BitSet bs) {
    byte[] bsa = bs.toByteArray();
    int l = bsa.length-0x01;
    byte[] bia = new byte[l];
    System.arraycopy(bsa,0,bia,0,l);
    return new BigInteger(bia);
}

And call it with:

BigInteger bi = new BigInteger("20000000");
System.out.println(bi);

BitSet bs = convertTo(bi);
System.out.println(bs);

BigInteger bi2 = convertFrom(bs);
System.out.println(bi2);

About the memory usage

In general both methods will use approximately the same number of bits. For the first implementation (without size marker and thus errors), it is possible that sometimes, the BitSet approach will use one byte less than the BigInteger approach. With the size marker (second approach), it is the opposite: in general the BitSet will use always one extra byte, except for some rare occasions.

Daniel Le
  • 1,800
  • 1
  • 14
  • 23
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • 1
    It doesn't work. (Endianness is backwards.) Try it out and you will see what I mean. – Radiodef Apr 08 '15 at 23:40
  • @Radiodef: you mean it works with *big-endian*? Evidently, how else could you represent an arbitrary large number? – Willem Van Onsem Apr 08 '15 at 23:42
  • 3
    `BigInteger#toByteArray` returns a big-endian array and `BitSet` expects a little-endian array. Check the docs. – Radiodef Apr 08 '15 at 23:43
  • It works for integers of up to "20000000" value and does not work for others. Is it possible to use for bigger integers? – Stephan Rozinsky Apr 08 '15 at 23:43
  • @StephanRozinsky: can you edit your question and give some testcases, this probably has something to do with endianness, but you must be more specific what you expect for large numbers. As you can see, for a large value `8e16`, it still maintains the number... – Willem Van Onsem Apr 08 '15 at 23:46
  • See e.g. [http://ideone.com/z6VrA2](http://ideone.com/z6VrA2). I don't mean to rain on your parade, I was going to post the same answer and found this out. So you need to reverse the array too. – Radiodef Apr 08 '15 at 23:46
  • @StephanRozinsky: one moment, will try to solve it. Does the endianness matters? – Willem Van Onsem Apr 08 '15 at 23:51
  • I'm not an expert in this, though something does definitely matter :) – Stephan Rozinsky Apr 08 '15 at 23:52
  • @Radiodef Do you know how to make it working for any integer value? Did not get you about reversing the array. – Stephan Rozinsky Apr 08 '15 at 23:55
  • @StephanRozinsky: found it, there is a problem with tailing zero's. So now we need to encode the length as well. Give me a sec. – Willem Van Onsem Apr 08 '15 at 23:57
  • @StephanRozinsky: the only problem is that there is no direct one-on-one mapping between the `BigInteger` and the `BigSet`. In other words, a `BigSet` is, without any modifications, not sufficient to represent a `BigInteger` since the length of the vector is lost. – Willem Van Onsem Apr 08 '15 at 23:59
  • What type should I use to convert then? – Stephan Rozinsky Apr 09 '15 at 00:02
  • for the time being I will use this answer http://stackoverflow.com/a/2473719/4574913 – Stephan Rozinsky Apr 09 '15 at 00:07
  • @StephanRozinsky: solved the problem. But as said before, it will introduce one additional bit into the bitset: a size marker. – Willem Van Onsem Apr 09 '15 at 00:11
  • @StephanRozinsky: and for a `long` it will work evidently with the `toByteArray` because the number of bits is known in advance (64). – Willem Van Onsem Apr 09 '15 at 00:12
  • This wouldn't work as it should, as endianness does not have to do with the bit order, but rather the byte order in the byte array. – Lone nebula Aug 22 '15 at 12:02
  • I've tested this with over 10G of numbers, can you give me one single value where this method fails? Furthermore this method works with negative numbers whereas yours doesn't. The endianness is furthermore irrelevant anyway. As said in the answer, the problem is mainly that in the conversion back, the BigInteger doesn't know the amount of bits. By encoding this the problem is resolved, regardless of how one manage to encode this. – Willem Van Onsem Aug 22 '15 at 19:45
  • I apologize for my vague comment. Your code does certainly convert from `BigInteger` to `BitSet` and back to the same `BigInteger` without fail. However, I believe the the asker expected the bits in the `BitSet` to be ordered like the binary representation of the corresponding `BigInteger`, as he mentions in his first paragraph, using 3 as an example. In other words that `bi.testBit(n) == bs.get(n)`. I chose not to include negative numbers, as those would require infinite length if they were to fulfill this property. (For negative numbers, `bi.testBit(n)` is `1` for all `n > bi.bitLength()`.) – Lone nebula Aug 22 '15 at 22:48
  • I had misunderstood your reason for sticking to your answer despite being aware of the difference in endianness. As it was due to you interpreting the question differently, and had nothing to do with high-to-low reading of the binary number (which I previously thought), I will remove unrelated text from my answer. – Lone nebula Aug 23 '15 at 09:27
  • this answer will introduce bug if you want to manipulate bits when test value bigger than 1 byte (ie. 256) due to the endian order difference between bigInteger and bitSet, please refer to answer from @Lone nebula which indicates the reverse operation. – coldestlin Feb 22 '21 at 03:17