Say I have multiple CSV's in the following format:
2018/05/11T00:05:45,true,happy
2018/05/11T01:33:45,false,mad
2018/05/11T02:23:45,true,sleepy
Assume that duplicate rows exist across the collection of CSV files. I will be ingesting the data into elasticsearch, though not all at once. for example, I could ingest 3 CSV files today and 3 different files tomorrow. Further, I may not have access to tomorrows files yet but must ingest todays files today, therefore I can't do a diff on todays/tomorrows files. There could be duplicate rows across both sets of files, hence the need to generate an _id per row before ingest time to prevent duplicates in the elastic index.
Using Python how can I create a GUID for each row such that I could identify all of the duplicates?