6

I have got two java objects with byte[] field of the size of the order of millions. What is the fastest and efficient way to check Deep Equal for these java objects?

Sample Entity:

@Entity
public class NormalBook
{

  @Id
  private String bookId;

  @Column
  private String title;

  @Column
  private byte[] pdfFile;

  //setters and getters

  }

Note: I am doing it for an ORM tool basically I am checking an object (which is in managed state) with an object present in Persistence Context.

Dev
  • 13,492
  • 19
  • 81
  • 174
  • 4
    If you do more than one comparison, it probably pays to calculate a checksum/hashcode for each. – biziclop Apr 17 '15 at 09:51
  • But it's quite a strange thing to do...what is the actual problem you're trying to solve? – biziclop Apr 17 '15 at 09:53
  • How did the 'object' end up in a jar file? Does the jar file contain serialized java objects? (Be aware that two serialized Java objects that are `equals()` - or even exactly the same instance - may produce a different sequence of bytes each time serialized.) – Paul Apr 17 '15 at 10:00
  • @Paul jar file is an example. I have attached a sample entity which may give u more insight. – Dev Apr 17 '15 at 10:05
  • in the case of ORM why it is not enough only to compare the bookId ? – Saif Apr 17 '15 at 10:06
  • who said it's enough ? – Dev Apr 17 '15 at 10:07
  • Do you really want to be storing the *contents* of the book, via ORM, into - I presume - a RDBMS? That sounds like a seriously Bad Idea. – Paul Apr 17 '15 at 10:12
  • @Paul Its not for RDBMS. It's for NoSQL database – Dev Apr 17 '15 at 10:13
  • Still, doing this would seem to mandate loading the entire *contents* of the book into the ORM. Why do that? When not just store a relative path to a file on disk? – Paul Apr 17 '15 at 10:15
  • @dev In that case you'll save yourself an awful lot of hassle if you also store the checksum of the file in the database. – biziclop Apr 17 '15 at 10:16
  • True. But we still don't know why the OP *wants* to check if the *contents* (i.e. bytes) of two books are the same. Or even if this is merely an artifact of trying to implement equality on a object that is unnecessarily storing the contents, when it doesn't need to (not through the ORM, anyway). – Paul Apr 17 '15 at 10:18

3 Answers3

2

Override equals() or have a *helper method (bad option!) and do it in 5 steps :

1. Check for *not null*.
2. Check for same *type*.
3. Check for *size of byte[]*.
4. Check for `==` (*reference equality* of byte[]) 
5. Start comparing byte values 
TheLostMind
  • 35,966
  • 12
  • 68
  • 104
  • If I compare byte value (step 4) , don't you think it will take a lot of time for a large object ? – Dev Apr 17 '15 at 09:47
  • 2
    @dev Of course the time will be proportional to the object size. What other option do you have to compare other than actually comparing ? – xlecoustillier Apr 17 '15 at 09:49
  • 1
    @dev - Well, if you want to check *value* then you will ahve to do it :) . `Arrays.equals()` will cover steps 3,4 and 5.. – TheLostMind Apr 17 '15 at 09:50
0

Use the following in the definition of equals() on your object's class:

java.util.Arrays.equals(bs1, bs2)

You might also want to check if they are the same array (instance) first. Though this method may well do that anyway.

For example (and making some assumptions about your class that contains the arrays):

public boolean equals(Object obj) {
    if(this == obj)
        return true;
    if(!(obj instanceof MyObject)) // covers case where obj null, too.
        return false;
    return Arrays.equals(this.bytes, ((MyObject)obj).bytes);
}

If there are other fields in your class, your equals() should take those into account too.

(There might be a better answer to the question if you could provide more information about what kind of data is stored in the arrays.)

Paul
  • 3,009
  • 16
  • 33
0

if your class have fields like byte[] you can use something like:

public class MyClass {


    byte[] a;

    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        MyClass other = (MyClass) obj;
        if (!Arrays.equals(a, other.a))
            return false;
        return true;
    }


}

If you are concern about performance and can assure a unique hascode(this is important hascode need to be unique) then you can just compare the hascode.

Saif
  • 6,804
  • 8
  • 40
  • 61
  • 1
    Good point. You should always implement `hashCode()` too, when you override `equals()`. – Paul Apr 17 '15 at 10:01
  • How would you assure a unique hashcode? In general, you can't. – user253751 Apr 17 '15 at 10:03
  • 1
    However, hash codes (in Java) *do not* need to be - and usually *are not* - unique. (They are NOT ids.) Aside: I once spend two weeks fixing a system written by someone who erraneously believed hash codes are always unique. `hashCode()` return values should be well-distributed however. *Effective Java* (Joshua Bloch) provides a good summary of when - and how - to override `hashCode()`. – Paul Apr 17 '15 at 10:07