1

It would be great if someone could help me to address below concern.

enter image description here

How do I populate ID column based on values in respect row for all columns? Something like crc32 or MD5 algorithms.

Koppula
  • 95
  • 7

1 Answers1

1

Using hashlib, you could take the dictionary of row values and translate that into an md5 hash.

import hashlib

df['Id'] = [hashlib.md5(str(x).encode('utf-8')).hexdigest() for x in df.T.to_dict().values()]
oskros
  • 3,101
  • 2
  • 9
  • 28
  • Thanks @oskros It seems your code is working for me, but need to test more with different cases. One question what does "T" stands in "df.T.to_dict()? – Koppula Jan 18 '21 at 13:21
  • `T` stands for transpose. So basically I am switching the rows and columns of the dataframe. The reason I do this, is because when creating a dictionary using the built-in pandas function, we get a list of dictionaries with row values for each column. Instead we want a list of dictionaries with column values for each row - This is solved with the transpose – oskros Jan 18 '21 at 13:25
  • thanks for the clarity @oskros. If I want to use CRC32 alogo can use hashlib? – Koppula Jan 18 '21 at 13:38
  • https://stackoverflow.com/questions/30092226/how-to-calculate-crc32-with-python-to-match-online-results – oskros Jan 18 '21 at 13:39