-1

I have learnt some of the hash-join algorithms, and I know there usually are hash tables whose keys are calculated by the hash function. I am wondering that can the hash function be omitted and just use the value instead?

For example, tables user_table

[{"name": "tom", "id": 1}, {"name": "jerry". "id": 2}]

join score_table

[{"score": 5, "id": 1}, {"score": 7, "id": 2}]

on id

can I just use the key id as the hash table key? So I can save the calculation of hash function.

Or it is said that hash function has many kinds and

def hash(id):
    return id

is one of them?

Is there any other needs that I should apply a hash function?

UPDATE

From the discussion with @OmG, I know at least in multiple key join, there must be a hash function to calculate the key.

jerryleooo
  • 843
  • 10
  • 16
  • What did your research reveal? [ask] – philipxy May 21 '19 at 08:34
  • Possible duplicate of [Hashing Algorithm, its uses?](https://stackoverflow.com/questions/2726001/hashing-algorithm-its-uses) – philipxy May 21 '19 at 08:36
  • Hi @philipxy I think I know what hash function is, and my question is what it is used in hash-join algorithms. Now I understand: at least in multiple key join, there must be a hash function to calculate the key. Hope it helps, thanks! – jerryleooo May 21 '19 at 08:40
  • That is not clear from your question, and if that's your question, please edit your post to be clear. But then you should have researched hash join algorithms, and as I already asked, what that research revealed should be in your post also. However, your question right now asks can you not use a hash function, but hashing needs a hash function, so you seem to be asking about hashing & you don't show you have researched hashing. – philipxy May 21 '19 at 08:42
  • Please don't insert EDITs or UPDATEs, edit your post to be the best presentation you can. – philipxy May 21 '19 at 08:49
  • Hi @philipxy I add the link for hash-join so people know what I am talking about. I was just not very clear about the hash function usage in hash-join algorithm. The hash function I refer is a general process of hash. I am not sure which part you are not clear. – jerryleooo May 21 '19 at 08:55
  • And from the answer below, I think @OmG clearly know what I am talking about... – jerryleooo May 21 '19 at 08:56
  • Let me rephrase. Using a hash table requires a hash function. Read a presentation of hash join & find the part that says why a hash function & hash table is used. If you think it doesn't explain then paraphrase/quote it & ask about it so we can clarify it. Don't ask us to write yet another presentation for you to not understand. PS The answer below is wrong. – philipxy May 21 '19 at 09:07
  • Your own link says "Because the hash table is accessed by applying a hash function to the join attribute, it will be much quicker to find a given join attribute's rows by using this table than by scanning the original relation." So what did you learn about why hash tables use hash functions when you then researched hash tables? – philipxy May 21 '19 at 09:10
  • @philipxy Thanks for your rephrase, the fact is I have implemented a join algorithm, it is like "classic hash join" but without a hash function, so I am confused about the usage of hash function in hash-join algorithm. I could not find a better name for the algorithms I implemented so I use the name "hash-join", and it IS a hash-join since the hash function I used is returning the input itself. – jerryleooo May 21 '19 at 09:10
  • I know what is hash table, my focus is the key calculation. – jerryleooo May 21 '19 at 09:15
  • Your own link: "First prepare a hash table". I just told you that for use of a hash table to have its desired computational complexity you need apply a hash function & that you will find out why in a presentation of hash tables. Googling 'site:stackoverflow.com why do i need a hash function for my hash table' ... [What's the point of a hash table?](https://stackoverflow.com/q/2179965/3404097) [Can hash tables really be O(1)?](https://stackoverflow.com/q/2771368/3404097) [How does a hash table work?](https://stackoverflow.com/q/730620/3404097) This post "does not show any research effort". – philipxy May 21 '19 at 10:23

1 Answers1

0

It backs to the definition of the hash-join algorithm. Actually, the philosophy of using the hash function in the hash-join algorithm backs to handling join attributes more efficiently. Here, in your specific case, as you have not any extra attribute on the join, and just using from the unique id, you do not need any more hash function or in the otherwords, your hash function could be the identity function.

However, you should notice that this solution is a domain specific solution and cannot be generalized for more complex cases without having a more complex hash function.

OmG
  • 18,337
  • 10
  • 57
  • 90
  • Thanks @Omg, may be a multiple key join is an example of "complex cases"? – jerryleooo May 21 '19 at 06:45
  • @Jerry yes. exactly. – OmG May 21 '19 at 06:47
  • @Jerry & Omg Using a hash table requires a hash function--so that initial values are evenly distributed to buckets. See the duplicate link or any presentation of hash table use, like the hash join algorithm. – philipxy May 21 '19 at 09:00