I am using pyspark.sql.functions.hash
on a given set of columns, and expect to get a different output for different rows.
I noticed that I am getting the same hash back although the values in the input rows was different.
Is this expected? or a bug?
df = df.withColumn("my_key", F.hash(["some other columns"])
This is obviously doesn't happen all time so hard to reproduce.