I've used to use this solution to compute and store value_counts of a column in Pandas and store the results in a new column.
Now I'm trying to do the same for a Dask Dataframe, but it causes the following error:
df['new_column'] = df.groupby(['A'])['B'].transform('count', meta='int').compute()
ValueError: cannot reindex from a duplicate axis
P.S. The df
dataframe has four partitions.
How can I count the value_count of column A
and store them in the new_column
in Dask, as same as this answer?