Dask rolling function by group syntax

Question

I struggled for a while with the syntax to work for calculating a rolling function by group for a dask dataframe. The documentation is excellent, but in this case does not have an example.

The working version I have is as follows, from a csv that contains a text field with User ids and and x, y, and z column:

ddf = read_csv('./*.csv')
ddf.groupby(ddf.User).x.apply(lambda x: x.rolling(5).mean(), meta=('x', 'f8')).compute()

Is this the recommended syntax for rolling functions applied by group within dask DataFrames, or is there a recommended alternative?

score 2 · Answer 1 · answered Jul 18 '18 at 08:58

2

In order to retain the groups in the result:

ddf.groupby(by=User).apply(lambda df_g: df_g['x'].rolling(5).mean(), meta=('x', 'f8')).compute()

answered Jul 18 '18 at 08:58

Paul-Armand Verhaegen

537
5
9

This is not working for me. Gives me `UnpicklingError: invalid load key, '�'.` – pissall Dec 24 '18 at 07:54
1

@pissall Try loading a small subset of your csv file. I would suspect it has something to do with bad data in your csv. You can use the nrows parameter on the read_csv function for this. Could you post some more info else? – Paul-Armand Verhaegen Dec 29 '18 at 14:00

Dask rolling function by group syntax

1 Answers1

Linked