2

I struggled for a while with the syntax to work for calculating a rolling function by group for a dask dataframe. The documentation is excellent, but in this case does not have an example.

The working version I have is as follows, from a csv that contains a text field with User ids and and x, y, and z column:

ddf = read_csv('./*.csv')
ddf.groupby(ddf.User).x.apply(lambda x: x.rolling(5).mean(), meta=('x', 'f8')).compute()

Is this the recommended syntax for rolling functions applied by group within dask DataFrames, or is there a recommended alternative?

cwallenpoole
  • 79,954
  • 26
  • 128
  • 166
J. Patanian
  • 71
  • 1
  • 5

1 Answers1

2

In order to retain the groups in the result:

ddf.groupby(by=User).apply(lambda df_g: df_g['x'].rolling(5).mean(), meta=('x', 'f8')).compute()
  • This is not working for me. Gives me `UnpicklingError: invalid load key, '�'.` – pissall Dec 24 '18 at 07:54
  • 1
    @pissall Try loading a small subset of your csv file. I would suspect it has something to do with bad data in your csv. You can use the nrows parameter on the read_csv function for this. Could you post some more info else? – Paul-Armand Verhaegen Dec 29 '18 at 14:00