Python - Pandas - Groupby conditional on column values in group

Question

I have a dataframe with the following structure with columns group_, vals_ and dates_.

I would like to perform a groupby operation on group_ and subsequently output for each group a statistic conditional on dates. For instance, the mean of all vals_ within a group whose associated date is below some date.

I tried

df_.groupby(group_).agg(lambda x: x[x['date_']< some_date][vals_].mean())

But this fails. I believe it is because x is not a dataframe but a series. Is this correct? Is it possible to achieve what I am trying to achieve here with groupby?

what about `df_.groupby(group_).agg(lambda x: x.loc[x['date_']< some_date, vals_].mean())` ? — jezrael, Feb 27 '17 at 14:28
Thanks. Before I try this, I just realized that the x are all of type series instead of dataframe. Is this expected here? — clog14, Feb 27 '17 at 14:29
Thanks, ok this was the big error in reasoning I made. However, I am trying to produce basically summary statistics where I get for each group the mean above and below a certain date threshold and for the whole date range. Additionally, I might like to output some other summary statistics on possible other columns. Is this possible at all with this approach? — clog14, Feb 27 '17 at 14:35
I think it can works, if have something more complicated try custom function like: `df_.groupby(group_).apply(f)` `def f(x): x1 = x.loc[x['date_']< some_date, vals_].mean() return x1` — jezrael, Feb 27 '17 at 14:38

pansen · Answer 1 · 2017-02-27T16:51:39.697

0

You can write it differently:

def summary(sub_df):
    bool_before = sub_df["date_"] < some_date
    bool_after = sub_df["date_"] > some_date

    before = sub_df.loc[bool_before, vals_].mean()
    after = sub_df.loc[bool_after, vals_].mean()
    overall = sub_df.loc[:, vals_].mean()

    return pd.Series({"before": before, "after": after, "overall": overall})

result = df_.groupby(group_).apply(summary)

The result is a data frame containing 3 mean values for before/after/overall.

If you require additional summary statistics, you can supply them within the summary function.

edited Feb 27 '17 at 16:51

answered Feb 27 '17 at 14:29

pansen

6,433
4
19
32

Hi Pansen, thanks. Can you see the discussion under the original question? I think this wont work in that case – clog14 Feb 27 '17 at 14:36
@clog14 I adjusted my answer to your new description. Can you update your question description, too? – pansen Feb 27 '17 at 14:45
hi pansen, many thanks. i will try in the application and eventually update the question with a complete toy example. thx clog – clog14 Feb 27 '17 at 16:17
@clog14 Ok cool - just updated `summary` function for better clarity. – pansen Feb 27 '17 at 16:52

Python - Pandas - Groupby conditional on column values in group

1 Answers1

Linked