9

I have a HashSet of byte[]s and I would like to test whether a new byte[] is in that set. The problem is that Java seems to be testing whether the byte[] instances are the same rather than testing whether the actual values in the byte arrays are the same.

In other words, consider the following code:

public class Test
{
    public static void main(String[] args)
    {
        java.util.HashSet<byte[]> set=new java.util.HashSet<byte[]>();
        set.add(new String("abc").getBytes());
        System.out.println(set.contains(new String("abc").getBytes()));
    }
}

This code prints out false and I would like it to print out true. How should I go about doing this?

Jack Edmonds
  • 31,931
  • 18
  • 65
  • 77
  • In all these answers, be wary of changing any elements of a byte array that is in the set; doing so will affect its hash and it's equality, but will not change the hash-bucket it's currently stored in. – Lawrence Dol Jun 29 '10 at 04:23
  • possible duplicate of [how to make a set of array in java?](http://stackoverflow.com/questions/9841934/how-to-make-a-set-of-array-in-java) – Raedwald Feb 26 '15 at 13:19

5 Answers5

7

You can wrap each byte array using ByteBuffer.wrap, which will provide the right equals and hashCode behavior for you. Just be careful what methods you call on the ByteBuffer (that you don't modify the array or advance its pointer).

Kevin Bourrillion
  • 40,336
  • 12
  • 74
  • 87
2

You could create a ByteArray class that wraps the byte arrays and tests for equality the way you want. Then you'd have a Set<ByteArray>.

Adam Crume
  • 15,614
  • 8
  • 46
  • 50
1

Modern ( as of right now solution )

import com.google.common.collect.ImmutableSet;

import java.nio.ByteBuffer;
import java.util.Set;

import static com.google.common.base.Charsets.UTF_8;
import static java.nio.ByteBuffer.wrap;

public class Scratch
{
    public static void main(String[] args)
    {
        final Set<ByteBuffer> bbs = ImmutableSet.of(wrap("abc".getBytes(UTF_8)).asReadOnlyBuffer());
        System.out.println("bbs.contains(ByteBuffer.wrap(\"abc\".getBytes(Charsets.UTF_8))) = " + bbs.contains(wrap("abc".getBytes(UTF_8)).asReadOnlyBuffer()));
    }
}

NOTES:

You should never convert a String to a byte[] without providing a Charset the results become runtime dependant based on the default Charset which is usually not a good one and can change.

.asReadOnlyBuffer() is important!

Creates a new, read-only byte buffer that shares this buffer's content. The content of the new buffer will be that of this buffer. Changes to this buffer's content will be visible in the new buffer; the new buffer itself, however, will be read-only and will not allow the shared content to be modified.

The two buffers' position, limit, and mark values will be independent.

The new buffer's capacity, limit, position, and mark values will be identical to those of this buffer. If this buffer is itself read-only then this method behaves in exactly the same way as the duplicate method.

1

You could define your own wrapper class, but probably the easiest thing to do is to "wrap" the arrays into ArrayLists and use a HashSet<ArrayList>.

President James K. Polk
  • 40,516
  • 21
  • 95
  • 125
0

You can avoid wrappers and the stupid hashCode problem (hey, a standard thing like a byte[] doesn't have hashCode right?):

Use TreeSet instead of HashSet and provide a byte[] comparator at instantiation time:

  Set<byte[]> byteATreeSet = new TreeSet<byte[]>(new Comparator<byte[]>() {
    public int compare(byte[] left, byte[] right) {
    for (int i = 0, j = 0; i < left.length && j < right.length; i++, j++) {
        int a = (left[i] & 0xff);
        int b = (right[j] & 0xff);
        if (a != b) {
            return a - b;
        }
    }
    return left.length - right.length;
   }});

If you get a byte[] HashSet b from somewhere else, initialize your variable a before as TreeSet and then use a.addAll(b); This way, even if b contained duplicates, a does not.

ib84
  • 675
  • 5
  • 16
  • 1
    Worth noting that TreeSet has worse time complexity for add, remove, contains methods than HashSet (O(lg n) vs O(1)), which might be an important factor. – cbartosiak Jan 22 '19 at 16:51