0

I have a dataframe with the following structure with columns group_, vals_ and dates_.

I would like to perform a groupby operation on group_ and subsequently output for each group a statistic conditional on dates. For instance, the mean of all vals_ within a group whose associated date is below some date.

I tried

df_.groupby(group_).agg(lambda x: x[x['date_']< some_date][vals_].mean()) 

But this fails. I believe it is because x is not a dataframe but a series. Is this correct? Is it possible to achieve what I am trying to achieve here with groupby?

clog14
  • 1,549
  • 1
  • 16
  • 32
  • what about `df_.groupby(group_).agg(lambda x: x.loc[x['date_']< some_date, vals_].mean())` ? – jezrael Feb 27 '17 at 14:28
  • Thanks. Before I try this, I just realized that the x are all of type series instead of dataframe. Is this expected here? – clog14 Feb 27 '17 at 14:29
  • Hmmm, it seems not, try `apply` instead `agg` – jezrael Feb 27 '17 at 14:31
  • Thanks, ok this was the big error in reasoning I made. However, I am trying to produce basically summary statistics where I get for each group the mean above and below a certain date threshold and for the whole date range. Additionally, I might like to output some other summary statistics on possible other columns. Is this possible at all with this approach? – clog14 Feb 27 '17 at 14:35
  • I think it can works, if have something more complicated try custom function like: `df_.groupby(group_).apply(f)` `def f(x): x1 = x.loc[x['date_']< some_date, vals_].mean() return x1` – jezrael Feb 27 '17 at 14:38

1 Answers1

0

You can write it differently:

def summary(sub_df):
    bool_before = sub_df["date_"] < some_date
    bool_after = sub_df["date_"] > some_date

    before = sub_df.loc[bool_before, vals_].mean()
    after = sub_df.loc[bool_after, vals_].mean()
    overall = sub_df.loc[:, vals_].mean()

    return pd.Series({"before": before, "after": after, "overall": overall})

result = df_.groupby(group_).apply(summary)

The result is a data frame containing 3 mean values for before/after/overall.

If you require additional summary statistics, you can supply them within the summary function.

pansen
  • 6,433
  • 4
  • 19
  • 32
  • Hi Pansen, thanks. Can you see the discussion under the original question? I think this wont work in that case – clog14 Feb 27 '17 at 14:36
  • @clog14 I adjusted my answer to your new description. Can you update your question description, too? – pansen Feb 27 '17 at 14:45
  • hi pansen, many thanks. i will try in the application and eventually update the question with a complete toy example. thx clog – clog14 Feb 27 '17 at 16:17
  • @clog14 Ok cool - just updated `summary` function for better clarity. – pansen Feb 27 '17 at 16:52