0

I have multiple columns that have the same name (ie. 3 columns named "box"), and I need to have one column that sums the values so that I can concatenate my dataframes. There are multiple instances of this needing to happen.

I've tried df1 = df.groupby(level=0, axis=1).sum() but I get one column with no values. (Ie. in the first row, "box" : 9, "box" : 1, "box" : 4 should become one column named "box" with value of 14, but it's showing one column named "box" but no value.)

apine19
  • 1
  • 1
  • Have you considered renaming your duplicate column names? – MyNameIsCaleb Sep 27 '19 at 22:55
  • They're being renamed and becoming duplicates in the process. Essentially I have thirty possible column names that fall into one of five buckets... – apine19 Sep 27 '19 at 22:59
  • Can you post some working code example with what you are seeing...i just tried your method and it seemed to work so i think i am not seeing clearly what you are either starting with or wanting to end with. – jtweeder Sep 27 '19 at 23:03
  • That's really unusual. It's hard to create duplicate indices for a reason... – Julia K Sep 27 '19 at 23:11

1 Answers1

2

You can sum based on column names using df['column_name'].sum(axis=1) and setting that for the column you want the results in.

Simple example below:

>>> df
   a  b   c  b   e
0  1  5  dd  1   6
1  2  9  ee  1  10
2  3  1  ff  1   2
>>> df['f'] = df['b'].sum(axis=1)
>>> df
   a  b   c  b   e   f
0  1  5  dd  1   6   6
1  2  9  ee  1  10  10
2  3  1  ff  1   2   2

Notice it isn't summing every value in the row, just the values from columns named 'b'.

This works because when you call a column by name and it's a duplicate, pandas recognizes that fact. For example:

>>> df['a']
0    1
1    2
2    3
Name: a, dtype: int64
>>> df['b']
   b  b
0  5  1
1  9  1
2  1  1
>>> df.columns.get_loc('a')
0
>>> df.columns.get_loc('b')
array([False,  True, False,  True, False, False])

I would recommend renaming your duplicate column names and avoiding the uncertainty that comes with having duplicate names in the future. This question has a lot of suggestions for how to do that as well as how to avoid it during import from other sources.

MyNameIsCaleb
  • 4,409
  • 1
  • 13
  • 31