dynamically select dataframe columns for groupby in python

Question

I have a pandas dataframe named Incoming_Tags

I can do groupby on the dataframe by mentioning the column names as input to groupby:

Example:

Incoming_Tags.groupby([ 'Domain','Tag_Name', 'Tag_hierarchy', 'html_attributes'])

I want to select columns dynamically for doing groupby.

Dynamically means by names. Instead of mentioning the columns names each time in groupby. I have defined a function group_by, which does the following:

def group_by(df,myList= [],*args): 
       Incoming_tag_groupby = df.groupby(myList).agg({'char_cnt': np.mean,'line_cnt':np.mean,'digit_cnt':np.mean,'sp_chr_cnt':np.mean,'word_cnt':np.mean}) 
       return Incoming_tag_groupby

What mean dynamically? By positions like `Incoming_Tags.groupby(Incoming_Tags.columns[:4].tolist())` ? — jezrael, May 18 '18 at 06:45
@Arghya, see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — akshat, May 18 '18 at 06:50
dynamically means by names.Instead of mentioning the columns names each time in groupby,i've defined a function group_by,which does the following :- def group_by(df,myList= [],*args): Incoming_tag_groupby = df.groupby(myList).agg({'char_cnt': np.mean,'line_cnt':np.mean,'digit_cnt':np.mean,'sp_chr_cnt':np.mean,'word_cnt':np.mean}) return Incoming_tag_groupby — Arghya Ganguly, May 18 '18 at 07:26
So you need `def group_by(df,myList= [],*args): return df.groupby(myList).mean()` ? — jezrael, May 18 '18 at 07:26
yes, snippet def group_by(df,myList= [],*args): if len(df.columns) == 0 or len(df.columns) == 1: return "groupby not possible" else: Incoming_tag_groupby = df.groupby(myList).agg({'char_cnt': np.mean,'line_cnt':np.mean,'digit_cnt':np.mean,'sp_chr_cnt':np.mean,'word_cnt':np.mean}) return Incoming_tag_groupby is there a better way? — Arghya Ganguly, May 18 '18 at 07:41

score 1 · Accepted Answer · answered May 18 '18 at 07:34

1

If want aggregate all numeric columns, non numeric are excluded by default:

def group_by(df,myList= [],*args):
    return df.groupby(myList).mean()

Or with c list of columns for specify columns for aggregating:

def group_by(df,myList= [],*args): 
    c = ['char_cnt','line_cnt','digit_cnt','sp_chr_cnt', 'word_cnt']
    return df.groupby(myList)[c].mean()

answered May 18 '18 at 07:34

jezrael

822,522
95
1,334
1,252

perfect.Thank you! – Arghya Ganguly May 18 '18 at 07:43

dynamically select dataframe columns for groupby in python

1 Answers1

Linked