4

Given two Sets: how to compare them efficiently in Java?

  • (a) keep them as Lists, sort them and compare them. (Comparable)
  • (b) keep them as Sets and compare the hashCode of the Sets?

background:

many comparisons need to be done Sets are small (usually < 5 elements per set).

Roman C
  • 49,761
  • 33
  • 66
  • 176
user1654885
  • 131
  • 1
  • 3
  • 8
  • 7
    `set1.equals(set2);` returns true if the 2 sets contain the same elements... – assylias Nov 13 '12 at 12:23
  • thank you for this answer. but is this more efficient than keeping the sets in lists and sorting them + comparsion? – user1654885 Nov 13 '12 at 12:24
  • Are you implementing Set yourself? Because the existing all have `equals` already? Or are you talking about comparing them for ordering (if so, explain how that is supposed to work)? – Thilo Nov 13 '12 at 12:24
  • no. actually i can keep my "data" in sets or lists. i just want to compare sets (e.g. two hashsets, treesets etc) whatever is most efficient. and i wonder how to compare them in an efficient way. – user1654885 Nov 13 '12 at 12:25
  • The two ways you said will very possibly give different results - and I think that sets naturally aren't comparable...... – luiges90 Nov 13 '12 at 12:26
  • What do you mean by compare? See if they are equal (i.e. contain equal elements)? Or see which one "comes first"/"is greater"? – Thilo Nov 13 '12 at 12:26
  • no they won't. and sets are comparable. (every element of setA is contained in setB and vice-versa)... – user1654885 Nov 13 '12 at 12:27
  • They are comparable in terms of equality, but if you want to order the sets then you would need some way to rank one set against another. – The Cat Nov 13 '12 at 12:28
  • if you mean "every element of setA is contained in setB " use http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Set.html#containsAll(java.util.Collection) – luiges90 Nov 13 '12 at 12:28
  • ok sorry my fault. I should have defined compare before: I would like to check weather the two sets consist of exactly the same elements. – user1654885 Nov 13 '12 at 12:29

4 Answers4

9

The proper way to compare two sets is to use the equals method. I would not worry about performance unless you have proven that this is a part of your code that is causing performance issue (which I doubt). And considering the size of your sets (5 elements) this will be very fast (probably sub millisecond).

keep them as lists, sort them and compare them. (comparable)

will certainly be slower as you will need to copy the elements, sort them and compare.

keep them as sets and compare the hashcode of the sets?

if 2 sets are equal (have the same content) they will have the same hashcode. The reciprocal is not true: 2 sets with different content may have the same hashcode. Also note that for a HashSet for example, the hashcode is calculated by iterating over all the elements so it is not a free operation.

assylias
  • 321,522
  • 82
  • 660
  • 783
  • 1
    Sub-millisecond would be slow. There is a good chance that it will be sub-microsecond, and even less if sets are unequal most of the time. – Stephen C Nov 13 '12 at 12:37
  • 1
    For HashSets this boils down to five hash lookups; for the dumbest approach imaginable, two five-element lists, this would still only entail a worst case of 25 `equals` comparisons. – Marko Topolnik Nov 13 '12 at 13:04
2

What's wrong with equals? The docs states that it returns true if both are of same size and if containsAll() returns true, sounds pretty efficient to me.

In any case, you should never compare the hashcode to test for equality, two different objects might have the same hashcode.

Update: As noted in the comments (and in assylias' answer) the hashcode can be used as part of the equality test logic (different hashcodes imply different objects - but not the reverse). My remark above means that the hashcode alone is not (in general) enough.

leonbloy
  • 73,180
  • 20
  • 142
  • 190
  • -1 for the hashcode remark. If you have a hashcode readily available, you should use it. Because if the codes differ, the objects are not equal. That is how String#equals is implemented. (Of course, if the hashcodes are equal, that tells you nothing). – Thilo Nov 13 '12 at 12:28
  • @Thilo Actually in the case of a set, the hashcode is not cached like in String, so its computation requires a loop over each element. – assylias Nov 13 '12 at 12:33
  • Yes, the hashcode is probably not cached in a Set (you cannot really say, because Set is an interface, in an immutable Set it might well be). But Set should look at size() first, which works pretty much in the same way. – Thilo Nov 13 '12 at 12:36
  • 1
    Though improperly stated, the hashcode of a set should ***absolutely*** not be used to make a determination about its equality to another set. An edit is in order but not a down vote, imo. – Perception Nov 13 '12 at 12:54
0

Assuming you want to do a comparison whether set1 has exactly the same elements of set2.

set1.equals(set2) and also set2.equals(set1) as to make sure both are exactly the same.

assylias
  • 321,522
  • 82
  • 660
  • 783
  • 6
    Only one comparison is necessary. Equals is symmetric: `set1.equals(set2)` will always return the same result as `set2.equals(set1)` (unless one of them is null of course). – assylias Nov 13 '12 at 13:30
0

If you have two HashSets, comparing them by Set.equals will be O(n) because only one set needs to be iterated through, and the other will be checked by contains, which is itself O(1).

Note that for sets as small as yours the difference between O(n) and O(n2) is neglible, so even the naïvest approaches will yield good performance.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436