5

I am facing a challenge that's driving me crazy after two hours of trial and error...

I need to hash at least two columns of a relational table with presto (actually with Amazon Athena which uses the presto engine).

My current state is this:

SELECT concat(lower(to_hex(md5(to_utf8(trim(column1))))),
              lower(to_hex(md5(to_utf8(trim(column2)))))) AS HK
FROM table
limit 10

The issue with this is, it hashes the columns first and afterwards concatenates them what ends up looking like this, because it concatenates the hashes:

8f9bfe9d1345237cb3b2b205864da075ce8ae9da5b7cd6c3df2929543a9af92d

Instead of concatenating the strings first and then hashing it what should end up like this:

8f9bfe9d1345237cb3b2b205864da075

I'd be very grateful for any ideas on how to solve this problem.

EDIT:

I got a solution, but only without using the trim() function, with it the query does not work and results in an error message

INVALID_FUNCTION_ARGUMENT: There must be two or more concatenation arguments

Current query:

SELECT lower(to_hex(md5(to_utf8(concat(user, email))))) AS UserMailHK FROM table limit 10

Any ideas on this?

MConan
  • 171
  • 1
  • 2
  • 8

1 Answers1

11

I got a solution thanks to Piotr!

SELECT lower(to_hex(md5(to_utf8(concat(trim(user), trim(email)))))) AS UserMailHK FROM table limit 10 
MConan
  • 171
  • 1
  • 2
  • 8
  • The above won't give you any hash value if either of `user` or `email` is null. You can use the `coalesce` function to replace the null value with an empty string (Source: https://stackoverflow.com/a/43275417/12242023). – Prabhatika Vij Feb 02 '22 at 16:20