python pandas groupby complex calculations

Question

I have dataframe df with columns: a, b, c,d. And i want to group data by a and make some calculations. I will provide code of this calculations in R. My main question is how to do the same in pandas?

library(dplyr)
df %>%
    group_by(a) %>%
    summarise(mean_b = mean(b),
              qt95 = quantile(b, .95),
              diff_b_c = max(b-c),
              std_b_d = sd(b)-sd(d)) %>% 
    ungroup()

This example is synthetic, I just want to understand pandas syntaxis

This is not a valid question. Most pandas users don't know R, or even if they did, there seldom is a 1-1 mapping between R and pandas functions. The right way to ask this question would be to show your data (a few rows at least), provide some expected output, and an explanation on how to get there. — cs95, Dec 19 '17 at 11:42
You may check [here](https://stackoverflow.com/questions/43172970/python-pandas-groupby-aggregate-on-multiple-columns) — akrun, Dec 19 '17 at 11:44
real data is absolutely unimportant. You can generate it via any random generators. I chosed very simple R functions like mean, sd, max, quantile. — Slavka, Dec 19 '17 at 11:45
What do you mean, it's unimportant? If you want to understand pandas syntax, please go look at the documentation. If you have a specific programming question, that's where SO comes in. — cs95, Dec 19 '17 at 11:46
Okay, you can generate dataframe so: df = pd.DataFrame({ 'a':np.random.choice([1,2,3,4], 100), 'b': np.random.choice(range(100), 100), 'c': np.random.choice(range(100), 100), 'd': np.random.choice(range(100), 100) }) — Slavka, Dec 19 '17 at 11:49
You should also edit your question with these details, that makes your question more valuable as a community resource. — cs95, Dec 19 '17 at 11:50

score 6 · Accepted Answer · answered Dec 19 '17 at 11:45

I believe you need custom function with GroupBy.apply:

def f(x):
    mean_b = x.b.mean()
    qt95 = x.b.quantile(.95)
    diff_b_c = (x.b - x.c).max()
    std_b_d = x.b.std() - x.d.std()
    cols = ['mean_b','qt95','diff_b_c','std_b_d']
    return pd.Series([mean_b, qt95, diff_b_c, std_b_d], index=cols)

df1 = df.groupby('a').apply(f)

python pandas groupby complex calculations

1 Answers1