2

I have a method that checks if two objects are equal(by reference).

public boolean isUnique( T uniqueIdOfFirstObject, T uniqueIdOfSecondObject ) {
    return (uniqueIdOfFirstObject ==  uniqueIdOfSecondObject);
}

(Use case) Assuming that I don't have any control over creation of the object. I have a method

void currentNodeExistOrAddToHashSet(Object newObject, HashSet<T> objectHash) {
    // will it be 100% precise? Assuming both object have the same field values.
    if(!objectHash.contains(newObject){
        objectHash.add(newObject);
    }
}

or I could do something like this

void currentNodeExistOrAddToHashSet(Object newObject, HashSet<T> objectHash){
    //as per my knowledge, there might be collision for different objects.
    int uniqueId =  System.identityHashCode(newObject);
    if(!objectHash.contains(uniqueId){
        objectHash.add(uniqueId);
    }
}

Is it possible to get a 100% collision proof Id in java i.e different object having different IDs, the same object having same ids irrespective of the content of the object?

aydinugur
  • 1,208
  • 2
  • 14
  • 21
Sharuk Ahmed
  • 825
  • 7
  • 15
  • 2
    `int` is bounded by 32 bit in Java. so technically answer is **No** – jmj Nov 16 '17 at 07:16
  • 2
    **`UUID`** is possible.However I doubt your requirement is really needed, as it is not giving better performance. – Joop Eggen Nov 16 '17 at 07:16
  • 1
    For your example you just need to implement `equals` properly. Why do you need an id? – algrid Nov 16 '17 at 07:19
  • @JigarJoshi Any other primitive data type or data structure is fine. – Sharuk Ahmed Nov 16 '17 at 07:24
  • @JoopEggen It for a theoretical purpose. I want to understand if it can be done irrespective of the performance. If not. why? If yes. How? What the closest solution to this problem? – Sharuk Ahmed Nov 16 '17 at 07:27
  • 1
    I subscribe to the UUID idea. It's 99.999999999% collision proof. – Georgian Nov 16 '17 at 07:27
  • @JoopEggen I haven't research on UUID, but thank you. I will take a look into it. – Sharuk Ahmed Nov 16 '17 at 07:28
  • 1
    @JoopEggen of course `UUID` is *better*, but using a `HashSet` still bounds this to 32 bits - as opposed to 128 that UUID has – Eugene Nov 16 '17 at 07:28
  • @algrid I participated in an online coding contest. My apology that I am unable to reproduce the same problem. If I could somehow get a 100% collision proof solution. I could have solved that problem in o(n). – Sharuk Ahmed Nov 16 '17 at 07:30
  • 2
    Either you're looking for guaranteed generic lossless compression, which is impossible by [the pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle), or it's simply converting data to a set of types of the same size or larger, which is easy enough. Or you may just be misunderstanding how hash-sets work - they are designed to differentiate between non-equal objects with the same hash. – Bernhard Barker Nov 16 '17 at 09:00
  • There is no sense in doing `if(!objectHash.contains(newObject){objectHash.add(newObject);}`. It’s already the contract of `Set.add` to only add the element if not being in the set. It even returns a `boolean` telling you whether it has been added. So you can simply use `objectHash.add(newObject);`, doubling your performance. – Holger Nov 17 '17 at 07:54
  • @Dukeling: perhaps, he’s just looking for [`IdentityHashMap`](https://docs.oracle.com/javase/8/docs/api/?java/util/IdentityHashMap.html)… – Holger Nov 17 '17 at 12:47

1 Answers1

1

Since you put them into a HashSet that uses hashcode/equals and hashCode is 32 bits long - this has a limit; thus collision will happen. Especially since a HashSet actually only cares about n-last bits before making itself bigger in size and thus adding one more bit and so on. You can read a lot more about this here for example.

The question is different here: why you want a collision free structure in the first place? If you define a fairly well distributed hashCode and a fairly decent equals - these things should not matter to you at all. If you worry about performance of a search, it is O(1) for HashSet.

You could define hashCode and equality based on UUID, like let's say UUID#randomUUID - but this still bounds your hashCode to the same 32-bits, thus collision could still happen.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • I will read about it. I can create a custom data structure. Irrespective of the performance, I am interested in knowing is it possible, at least theoretically? – Sharuk Ahmed Nov 16 '17 at 07:36
  • @SharukAhmed as long as you them to be a `Set` structure - no; all of them use hashcode that is 32 bits (int) long. – Eugene Nov 16 '17 at 13:50