14

I want to build a matrix from series but before that I have to resample those series. However, to avoid processing the whole matrix twice with replace(np.nan, 0.0) I want to append the dataframes to a collecting dataframe and then remove NaN values in one pass.

So instead of

user_activities = user.groupby(["DOC_ACC_DT", "DOC_ACTV_CD"]).agg("sum")["SUM_DOC_CNT"].unstack().resample("1D").replace(np.nan, 0)
df = df.append(user_activities[activity].rename(user_id))

I want

user_activities = user.groupby(["DOC_ACC_DT", "DOC_ACTV_CD"]).agg("sum")["SUM_DOC_CNT"].unstack().resample("1D")
df = df.append(user_activities[activity].rename(user_id))

but that is not working because user_activities is not a dataframe after resample().

The error suggests that I try apply() but that method expects a parameter:

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _make_wrapper(self, name)
    507                    "using the 'apply' method".format(kind, name,
    508                                                      type(self).__name__))
--> 509             raise AttributeError(msg)
    510 
    511         # need to setup the selection

AttributeError: Cannot access callable attribute 'rename' of 'SeriesGroupBy' objects, try using the 'apply' method

How can I solve this issue?

Asclepius
  • 57,944
  • 17
  • 167
  • 143
Stefan Falk
  • 23,898
  • 50
  • 191
  • 378
  • What is `user_activities` after `resample`? – IanS Sep 14 '16 at 15:29
  • 1
    resample no longer returns a dataframe: it's now "lazyly evaluated" at the moment of the aggregation or interpolation. => depending on your use case, replacing `.resample("1D")` with `.resample("1D").mean()` (i.e. downscaling) or with `.resample("1D").interpolate()` (upscaling) could be what you're after, and they both return a dataframe. – Svend Sep 15 '16 at 08:57

2 Answers2

19

The interface to .resample has changed in Pandas 0.18.0 to be more groupby-like and hence more flexible ie resample no longer returns a DataFrame: it's now "lazyly evaluated" at the moment of the aggregation or interpolation.

I suggest reading resample API changes http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#resample-api

See also:

for upscaling

df.resample("1D").interpolate()

for downscaling

using mean

df.resample("1D").mean()

using OHLC

ie open high low close values or first maximal minimal last values

df.resample("1D").ohlc()
scls
  • 16,591
  • 10
  • 44
  • 55
0

One way is to use .aggregate.

As per the docs, note first that .agg is an alias for it, and is preferred:

agg is an alias for aggregate. Use the alias.

It can be used as in the example below:

df.resample('1D').agg({'close': 'last', 'open': 'first'})

This returns a dataframe.

Asclepius
  • 57,944
  • 17
  • 167
  • 143