How to reproduce Spark hash function using Python?

Asked Jul 19 '23 at 06:58

Active Jul 19 '23 at 10:02

Viewed 27 times

Spark uses Murmur3 algorithm to compute the hash value. I tried using this Python-based mmh3 package to produce a hash but it produces a different hash value from Spark.

I've read lots of relavent questions about Spark's Hash algorithm but I still don't know how to get same hash value in pure Python.

What hash algorithm is used in pyspark.sql.functions.hash?
Hash function in spark
Scala MurmurHash3 library not matching Spark Hash function

edited Jul 19 '23 at 10:02

Gino Mempin

25,369
29
96
135

asked Jul 19 '23 at 06:58

Viperl

How to reproduce Spark hash function using Python?

0 Answers0