2

I would like to convert a String consisting of 0's and 1's to an array of bits.
The String is of length ~30000 and is sparse (mostly 0s, few 1s)
For example, given a string
"00000000100000000010000100000000001000"
I would like to convert it to an array of bits which will store
[00000000100000000010000100000000001000]

I am thinking of using BitSet or OpenBitSet Is there a better way? The use case is to perform logical OR efficiently.

I am thinking along these lines

final OpenBitSet logicalOrResult = new OpenBitSet(); 
for (final String line : lines) {

   final OpenBitSet myBitArray = new OpenBitSet(); 
   int pos = 0;
   for (final char c : str.toCharArray()) {
         myBitArray.set(pos) = c;
         pos++;
   }
   logicalOrResult.or(myBitArray);
}
Tad
  • 838
  • 2
  • 11
  • 22

3 Answers3

2

BigInteger can parse it and store it, and do bitwise operations:

BigInteger x = new BigInteger(bitString, 2);
BigInteger y = new BigInteger(otherBitString, 2);
x = x.or(y);
System.out.println(x.toString(2));
Boann
  • 48,794
  • 16
  • 117
  • 146
  • do you know whether there are any benchmarks between BigInteger and OpenBitSet and BitSet (or any other library) for performing logical OR? – Tad Oct 02 '14 at 01:46
  • @Tad I do not know, but you can easily compare them by doing the benchmarks within the intended application. – Boann Oct 02 '14 at 01:51
1

A BitSet ranging over values between 0 and 30000 requires a long array of size less than 500, so you can assume that BitSet.or (or the respective OpenBitSet method) will be sufficiently fast, despite the sparsity. It looks like OpenBitSet has better performance than BitSet, but apart from this it doesn't really matter which you use, both will implement or efficiently. However, be sure to pass the length of the String to the (Open)BitSet constructor to avoid reallocations of the internal long array during construction!

If your strings are much longer and your sparsity is extreme, you could also consider storing them as a sorted list of Integers (or ints, if you use a library like Trove), representing the indices which contain a 1. A bitwise or can be implemented in a merge(sort)-like fashion, which is quite efficient (time O(n + m), where n, m are the numbers of ones in each string). I suspect that in your scenario it will be slower than the BitSet approach though.

misberner
  • 3,488
  • 20
  • 17
  • my BitSet will range over values 1 and 0 - not 0 and 30000. Is this what you mean? – Tad Oct 02 '14 at 11:45
  • 1
    No. Your `BitSet` *represents* a set of integers between 0 and some upper limit. E.g., the bitset 01001101 represents the set {0, 2, 3, 6} - the set of indices set to 1. This is also what constitutes the `toString()` representation of a `BitSet`. Since your strings are of length ~30000, the corresponding set of 1-indices ranges over values between 0 and 30000. – misberner Oct 02 '14 at 18:40
0

You can iterate through each character:

boolean[] bits = new boolean[str.length];

for (int i=0;i<str.length;i++) {
    if (str.charAt(i).equals("1")
        bits[i] = true;
    else if (str.charAt(i).equals("0")
        bits[i] = false;
}

If you want to be memory efficient, you could try RLE (Run Length Encoding).

Anubian Noob
  • 13,426
  • 6
  • 53
  • 75
  • I like this approach but I have read in http://stackoverflow.com/questions/383551/what-is-the-size-of-a-boolean-variable-in-java that using BitSet has a better optimization than an array of booleans. I was also considering RLE, which would be great since my array is sparse, but I am not sure how to do logical OR on compressed array - I guess I would need to decompress it first, unless I am missing something. – Tad Oct 02 '14 at 01:55