-1

Because of the word limit on the title, I could not specify what I meant by "perfect hash function" and "best hash function".

h, the perfect hash function is only perfect regarding the impossibility of collision, not its distribution.

The hash function of c3 should be solely implemented based on its two fields of type c1 and c2, and it should be able to avoid collisions, but not necessarily offer a good distribution of hashes.

Whatever the answer, I would like an explanation for the Math behind the function, and why it avoids collisions, so I can aim for that same Math next time I'm in a similar situation.

(The code bellow is written in C#, but the problem is general enough to not award it the [C#] tag)

EDIT: Note that all hash functions involved need to return a 32bits integer.

class C3
{
    C1 _c1Field;
    C2 _c2Field;
    public override int GetHashCode()
    {
        //what to do here?
    }
}

EDIT2: The thing about claiming my question is a duplicate is this: I don't want a good hash function according to normal standards. I thought I made that clear enough, but apparently I didn't. I need a hash function that completely ignores uniformity of distribution. I don't need to care about that at all. I just need a function that avoids collision. I'm not asking it to be perfect, and to never collide, that would be silly. I'm asking for a function that, given two images under our ideal (and purely hypothetical) h function, would output an image with a minimized chance of collision with another, different pair of images under h.

I don't know any other name for that kind of function, so I'm calling it a hash function. It may be that the problem lies with my lack of specialized vocabulary, as I only code as hobby and out of love, not for a living.

FinnTheHuman
  • 1,115
  • 13
  • 29
  • 1
    when you say c1 and c2 implement perfect hash functions, do you mean for some minimum number of possible hash values? Say you only need to hash c1 and c2 down to 32 bits. Then you could do hash(_c1Field) left shifted by 32 and hash(_c2Field) and then you could create a 64-bit hash for C3 by bitwise or – Ryan Stout Sep 03 '17 at 23:26
  • 1
    There can't be a general answer to this, imagine the C1 and C2 hashes use the full range of int. – pvg Sep 03 '17 at 23:27
  • @RyanStout Apparently, the C# tag applies then. I need to override the `GetHashCode` method, so I need c3 hash to be 32 bits only. I'm sorry, I'll edit the question to make that clearer. – FinnTheHuman Sep 03 '17 at 23:28
  • @pvg Yes, let us imagine that, since my question doesn't specify any range. Isn't there a mathematical function which gives an ideal hash, collision wise in that case? – FinnTheHuman Sep 03 '17 at 23:33
  • 1
    @FinnTheHuman if i have a hash function with output range 1 to 10, what mathematical magic can give you collision free output for 20 items? – pvg Sep 03 '17 at 23:35
  • @pvg it should only *minimize* collision. I understand not even *h*, the collision free function, exists in reality. Assuming *h* exists, which function could we apply to two images under *h* that would minimize collision cases? That's the question. – FinnTheHuman Sep 03 '17 at 23:39
  • 1
    There are lots of functions like that, _Effective Java_ has a good example and there are many variants of this question on SO, for instance https://stackoverflow.com/questions/2738886/what-is-a-best-practice-of-writing-hash-function-in-java – pvg Sep 03 '17 at 23:44
  • Possible duplicate of [What is a best practice of writing hash function in java?](https://stackoverflow.com/questions/2738886/what-is-a-best-practice-of-writing-hash-function-in-java) – pvg Sep 03 '17 at 23:44
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/153578/discussion-between-finnthehuman-and-pvg). – FinnTheHuman Sep 03 '17 at 23:46
  • @RyanStout actually, on second look, your idea is great. I can count on Int64 hash function to be well implemented, and I can do exactly what you said, with the extra step of then invoking `GetHashCode` on the resulting 64bits integer! You should write that as an answer, so I can accept it. – FinnTheHuman Sep 04 '17 at 00:35

1 Answers1

0

You could hash both _c1Field and _c2Field to 32 bits. Left shift one of them by 32 bits and then add the left shifted value to the unshifted value to get a 64 bit integer. Then you can hash the 64 bit integer down to 32 bits using a standard hash function as necessary.

Ryan Stout
  • 1,038
  • 6
  • 13