merge dataframe columns with same name BUT without sorting the columns

Question

I have a dataframe with the column names:

'01 2017' '02 2017' ...... '12 2017' '01 2018' '01 2018' ....

so you can see there is a repeat of 01 2018 and 01 2018 for two columns. I would like to sum the columns without rearranging the columns.

I have this code currently:

df.groupby(lambda x:x, axis=1).sum()

However, the ouput is:

'01 2017' '01 2018' ... '12 2017' '12 2018'

This sorts it by the first 2 digits and messes up the arrangement.

Please take a moment to take a look at the advice at https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples and the instructions it has there as to tips on making an answerable question. — Jon Clements, Aug 08 '18 at 14:38

Scott Boston · Answer 1 · 2018-08-08T16:16:11.027

I think you need reindex, using @piRSquared setup modified:

Use df.columns.unique true a list of unique column names in the current order, then use reindex with axis=1 to order the columns after the groupby with sum.

df = pd.DataFrame(
    np.random.randint(10, size=(6, 5)),
    columns='1 2017|2 2017|1 2018|2 2018|2 2018'.split('|')
)
df.groupby(df.columns, axis=1).sum().reindex(df.columns.unique(), axis=1)

Output:

   1 2017  2 2017  1 2018  2 2018
0       9       2       4      15
1       5       0       0      10
2       1       8       5      10
3       1       1       9       8
4       9       0       0       0
5       6       1       4       5

score 2 · Answer 2 · answered Aug 08 '18 at 14:52

2

By using the data from Pir , slightly change it

Data input

np.random.seed([3, 1415])
df = pd.DataFrame(
    np.random.randint(10, size=(5, 4)),
    columns='1 2017|2 2017|1 2018|1 2018'.split('|')
)
df
Out[346]: 
   1 2017  2 2017  1 2018  1 2018
0       0       2       7       3
1       8       7       0       6
2       8       6       0       2
3       0       4       9       7
4       3       2       4       3

solution

df.sum(level=0,axis=1) # using `sum`
Out[347]: 
   1 2017  2 2017  1 2018
0       0       2      10
1       8       7       6
2       8       6       2
3       0       4      16
4       3       2       7

answered Aug 08 '18 at 14:52

BENY

317,841
20
164
234

2

I like your solution better. :) +1 "Myopic Scott" they call me. – Scott Boston Aug 08 '18 at 15:00
@ScottBoston ha, that made me lol. I'm guilty as well. I missed the duplicate columns and all that (-: – piRSquared Aug 08 '18 at 15:15
Deleting my answer. Grab setup for your guy's answers. – piRSquared Aug 08 '18 at 15:16

merge dataframe columns with same name BUT without sorting the columns

2 Answers2