4

I have a dataframe, and I want to groupby some attributes and calculate the rolling mean of a numerical column in Dask. I know there is no implementation in Dask for groupby rolling but I read an SO question which shows it was possible.

Dask rolling function by group syntax

When I am using the same syntax from the post, I get an error :

UnpicklingError: invalid load key, '�'.

I do not understand why I am getting an unpickling error. df.groupby(by=path)[metric].apply(lambda df_g: df_g[metric].rolling(5).mean(), meta=(metric, 'f8')).compute() where path is a list of attribute columns, and metric is the numeric column.

I also tried the following:

def moving_avg(partition):
    return partition.rolling(5).mean()

df.groupby(by=path)[metric].apply(moving_avg, meta='f8').compute()

I use the rolling average function in Pyspark where I define the partitions by groupby and then roll a window over it.

Sample data :

           CATEGORY_NAME               MKT   ...         Growth   Sales
Date                                         ...                       
2017-01-07            TP              SIMS   ...         0.0000   17280
2017-01-07            TP           TOPRITE   ...        -0.4566    1825
2017-01-07            TP       GIANT HYPER   ...         0.0874   18417
2017-01-07            TP       GIANT HYPER   ...        -0.1359   10914
2017-01-07            TP       GIANT HYPER   ...         0.2245    4422
2017-01-07            TP           TOPRITE   ...         0.1084    1444
2017-01-07            TP       GIANT HYPER   ...         0.0542   18412
2017-01-07            TP            FENCER   ...         0.2766   25184
2017-01-07            TP       GIANT HYPER   ...        -0.0572   19466
2017-01-07            TP           TOPRITE   ...         0.1795    1503
2017-01-07            TP       GIANT HYPER   ...         0.0770   13615

Say I want to groupby ["CATEGORY_NAME", "MKT"] and take a rolling average of Sales.

pissall
  • 7,109
  • 2
  • 25
  • 45

1 Answers1

0

See Dask rolling function by group syntax for the answer. The UnpicklingError is unrelated to the question.