0

I have a pandas dataframe named Incoming_Tags enter image description here

I can do groupby on the dataframe by mentioning the column names as input to groupby:

Example:

Incoming_Tags.groupby([ 'Domain','Tag_Name', 'Tag_hierarchy', 'html_attributes'])

I want to select columns dynamically for doing groupby.

Dynamically means by names. Instead of mentioning the columns names each time in groupby. I have defined a function group_by, which does the following:

def group_by(df,myList= [],*args): 
       Incoming_tag_groupby = df.groupby(myList).agg({'char_cnt': np.mean,'line_cnt':np.mean,'digit_cnt':np.mean,'sp_chr_cnt':np.mean,'word_cnt':np.mean}) 
       return Incoming_tag_groupby
akshat
  • 1,219
  • 1
  • 8
  • 24
  • What mean dynamically? By positions like `Incoming_Tags.groupby(Incoming_Tags.columns[:4].tolist())` ? – jezrael May 18 '18 at 06:45
  • 2
    @Arghya, see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – akshat May 18 '18 at 06:50
  • dynamically means by names.Instead of mentioning the columns names each time in groupby,i've defined a function group_by,which does the following :- def group_by(df,myList= [],*args): Incoming_tag_groupby = df.groupby(myList).agg({'char_cnt': np.mean,'line_cnt':np.mean,'digit_cnt':np.mean,'sp_chr_cnt':np.mean,'word_cnt':np.mean}) return Incoming_tag_groupby – Arghya Ganguly May 18 '18 at 07:26
  • So you need `def group_by(df,myList= [],*args): return df.groupby(myList).mean()` ? – jezrael May 18 '18 at 07:26
  • yes, snippet def group_by(df,myList= [],*args): if len(df.columns) == 0 or len(df.columns) == 1: return "groupby not possible" else: Incoming_tag_groupby = df.groupby(myList).agg({'char_cnt': np.mean,'line_cnt':np.mean,'digit_cnt':np.mean,'sp_chr_cnt':np.mean,'word_cnt':np.mean}) return Incoming_tag_groupby is there a better way? – Arghya Ganguly May 18 '18 at 07:41

1 Answers1

1

If want aggregate all numeric columns, non numeric are excluded by default:

def group_by(df,myList= [],*args):
    return df.groupby(myList).mean()

Or with c list of columns for specify columns for aggregating:

def group_by(df,myList= [],*args): 
    c = ['char_cnt','line_cnt','digit_cnt','sp_chr_cnt', 'word_cnt']
    return df.groupby(myList)[c].mean()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252