-1

My input excel (xlsx) file has a format like:

mz     n     n     n     n     g_1     g_1     g_2     g_2     g_2     
1      2     3     4     5     6       7       8       8       8       
1      2     3     4     5     6       7       8       8       8       
1      2     3     4     5     6       7       8       8       8       
1      2     3     4     5     6       7       8       8       8       

When I read the file using pd.read_excel, it somehow adds numbers to each column like:

mz     n     n.1     n.2     n.3     g_1     g_1.1     g_2     g_2.1     g_2.2
1      2     3       4       5       6       7         8       8         8
1      2     3       4       5       6       7         8       8         8      
1      2     3       4       5       6       7         8       8         8   

so I am unable to use groupby to group those with 'n', 'g_1', and so forth. Is there a way to make the groupby work on the specific groups? I tried merging the column header with the same types but to no avail.

Edit: The answer that I've chosen solved the question. However, I do have one additional question. When I add the code from the answer, the resulting grouped dataframe has the columns all out of order. Is there a way to conserve the order of the column names? Thanks!

Bong Kyo Seo
  • 381
  • 2
  • 7
  • 18
  • https://stackoverflow.com/questions/39986925/pandas-multiple-columns-same-name – BENY Nov 01 '17 at 04:52
  • I did find this post before I posted mine. This post wasn't applicable to mine or I don't know how to apply this to my question. – Bong Kyo Seo Nov 01 '17 at 06:08

1 Answers1

1

IIUC, use split then group on the first part before '.':

df.groupby(df.columns.str.split('.').str[0], axis=1).sum()

Output:

   g_1  g_2  mz   n
0   13   24   1  14
1   13   24   1  14
2   13   24   1  14

Where df is:

   mz  n  n.1  n.2  n.3  g_1  g_1.1  g_2  g_2.1  g_2.2
0   1  2    3    4    5    6      7    8      8      8
1   1  2    3    4    5    6      7    8      8      8
2   1  2    3    4    5    6      7    8      8      8
Scott Boston
  • 147,308
  • 15
  • 139
  • 187