Sum up value in different numbers of columns for each row

Question

I have a data frame including number of sold tickets in different price buckets for each flight.

For each record/row, I want to use the value in one column as an index in iloc function, to sum up values in a specific number of columns.

Like, for each row, I want to sum up values from column index 5 to value in ['iloc_index']

I tried df.iloc[:, 5:df['iloc_index']].sum(axis=1) but it did not work.

sample data:

   A  B  C  D  iloc_value  total
0  1  2  3  2     1
1  1  3  4  2     2
2  4  6  3  2     1

for each row, I want to sum up the number of columns based on the value in ['iloc_value']

for example,

for row0, I want the total to be 1+2
for row1, I want the total to be 1+3+4
for row2, I want the total to be 4+6

Welcome to SO. please include sample data and the expected output **as text** in your question, and read https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Roy2012, Jul 04 '20 at 12:59
For row1, how do you get to 1+2+3 from the values in that row? shouldn't it be 1+3+4? — Roy2012, Jul 04 '20 at 13:12

lytseeker · Answer 1 · 2020-07-04T13:41:18.510

EDIT: I quickly got the results this way:

First define a function that can do it for one row:

def sum_till_iloc_value(row):
    return sum(row[:row['iloc_value']+1])

Then apply it to all rows to generate your output:

df_flights['sum'] = df_flights.apply(sum_till_iloc_value, axis=1)

   A  B  C  D  iloc_value  sum
0  1  2  3  2           1    3
1  1  3  4  2           2    8
2  4  6  3  2           1   10

PREVIOUSLY: ~~Assuming you have information that looks like:~~

df_flights = pd.DataFrame({'flight':['f1', 'f2', 'f3'], 'business':[2,3,4], 'economy':[6,7,8]})
df_flights

    flight  business    economy
0   f1  2   6
1   f2  3   7
2   f3  4   8

you can sum the columns you want as below:

df_flights['seat_count'] = df_flights['business'] + df_flights['economy']

This will create a new column that you can later select:

df_flights[['flight', 'seat_count']]


 flight seat_count
0   f1  8
1   f2  10
2   f3  12

Thanks for your quick reply. But I want to sum up values in different number of columns for each row, based on value in a column. I have added sample data — yang, Jul 04 '20 at 13:10
I just updated the answer to work as per the data you provided, hope that helps! — lytseeker, Jul 04 '20 at 13:40

Roy2012 · Answer 2 · 2020-07-04T15:16:39.650

Here's a way to do that in a fully vectorized way: melting the dataframe, summing only the relevant columns, and getting the total back into the dataframe:

d = dict([[y, x] for x, y in enumerate(df.columns[:-1])])

temp_df = df.copy()
temp_df = temp_df.rename(columns=d)
temp_df = temp_df.reset_index().melt(id_vars = ["index", "iloc_value"])
temp_df = temp_df[temp_df.variable <= temp_df.iloc_value]
df["total"] = temp_df.groupby("index").value.sum()

The output is:

   A  B  C  D  iloc_value  total
0  1  2  3  2           1      3
1  1  3  4  2           2      8
2  4  6  3  2           1     10

Did this answer your question? – Roy2012 Jul 04 '20 at 15:16 — Roy2012, Jul 04 '20 at 15:16

Sum up value in different numbers of columns for each row

2 Answers2