Combining multiple columns in a DataFrame

Question

I have a DataFrame with 40 columns (columns 0 through 39) and I want to group them four at a time:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.binomial(1, 0.2, (100, 40)))

new_df["0-3"] = df[0] + df[1] + df[2] + df[3]
new_df["4-7"] = df[4] + df[5] + df[6] + df[7]
...
new_df["36-39"] = df[36] + df[37] + df[38] + df[39]

Can I do this in a single statement (or in a better way than summing them separately)? The column names in the new DataFrame are not important.

score 2 · Answer 1 · answered Sep 22 '15 at 17:21

You could select out the columns and sum on the row axis, like this.

df['0-3'] = df.loc[:, 0:3].sum(axis=1)

A couple things to note:

Summing like this will ignore missing data while df[0] + df[1] ... propagates it. Pass skipna=False if you want that behavior.
Not necessarily any performance benefit, may actually be a little slower.

score 2 · Answer 2 · answered Sep 22 '15 at 21:09

2

Here's another way to do it:

new_df = df.transpose()  
new_df['Group'] = new_df.index / 4  
new_df = new_df.groupby('Group').sum().transpose()

Note that the divide-by operation here is integer division, not floating-point division.

answered Sep 22 '15 at 21:09

DalekSec

151
4

ayhan · Accepted Answer · 2016-06-05T13:31:55.967

1

I don't know if it is the best way to go but I ended up using MultiIndex:

df.columns = pd.MultiIndex.from_product((range(10), range(4)))
new_df = df.groupby(level=0, axis=1).sum()

Update: Probably because of the index, this was faster than the alternatives. The same can be done with df.groupby(df.columns//4, axis=1).sum() faster if you take into account the time for constructing the index. However, the index change is a one time operation and I update the df and take the sum thousands of times so using a MultiIndex was faster for me.

edited Jun 05 '16 at 13:31

answered Sep 22 '15 at 20:21

ayhan

70,170
20
182
203

Please read: https://stackoverflow.com/questions/49970526/need-help-to-solve-the-unnamed-and-to-change-it-in-dataframe-in-pandas/49970566#comment86991353_49970566 – cs95 Apr 23 '18 at 18:20

score 0 · Answer 4 · answered Sep 22 '15 at 21:00

0

Consider a list comprehension:

df = # your data
df_slices = [df.iloc[x:x+4] for x in range(10)]

Or more generally

df_slices = [df.iloc[x:x+4] for x in range(len(df.columns)/4)]

answered Sep 22 '15 at 21:00

Alex

2,154
3
26
49

Combining multiple columns in a DataFrame

4 Answers4

Linked