Fairly new to Dask but just wondering why it is behaving in such strange way. Essentially, I create a new column with random uuids and join it to another dask dataframe. For some odd reason the uuids keep changing and not sure if I am missing something?
This is a representation of my code:
def generate_uuid() -> str:
""" generates uuid4 id """
return str(uuid4())
my_dask_data = dd.from_pandas(my_pandas_data, npartitions=4)
my_dask_data["uuid"] = None
my_dask_data["uuid"] = my_dask_data.apply(generate_uuid, axis=1, meta=("uuid"), "str"))
print(my_dask_data.compute())
And this is the output:
name uuid
my_name_1 16fb858c-bbed-413b-a415-62099ee2c455
my_name_2 9acd0a22-9b19-4db6-9759-b70dc0353710
my_name_3 5d610aaf-a813-4d0b-8d83-8f11fe400c7e
Then, I do a concat with other dask dataframe:
joined_data = dd.concat([my_dask_data, my_other_dask_data], axis=1)
print(joined_data.compute())
This is the output, which for some reason it produces new uuids:
name uuid tests
my_name_1 f951cefa-1145-411c-96f6-924730d7cb22 test1
my_name_2 88e28e5f-42ea-4fbe-a036-b8179a0ba3f8 test2
my_name_3 50e70fac-da19-4d2f-b6ea-80da41591ac5 test3
Any thoughts on how to keep the same uuids without changing?