How to create unique id column in dsak cudf dataframe across all the partitions So far I am using following technique, but if I increase data to more than 10cr rows it is giving me memory error.
def unique_id(df):
rag = cupy.arrange(len(df))
df['unique_id']=rag
return df
part = data.npartitions
data = data.repartitions(npartitions=1)
cols_meta={c:str(data[c].dtype) for c in data.columns}
data = data.map_partitions(lambda df:unique_id(df), meta={**cols_meta,'unique_id'})
data = data.repartitions(npartitions=part)
If there's any other way, or any modification in code, please suggest. Thank you for help