0

I want to create a new column called "part_1_total" that pastes together all the values of the columns that contain the string 'Part 1' (same should be done for the next set of columns that contain 'Part 2' , Part 3' etc...)

Is there a quick way to do this?

My attempts:

# Attempt 1 yields 0 as it is to sum up numbers 
def calc_total(df,string='Part 1'):
    return df.loc[:,[x for x in df.columns if string in x]].sum(axis=1)

# Attempt number 2 pastes the column names into all the cells
asos['part_1_total'] = ''.join(asos.loc[:,[x for x in asos.columns if 'Part 1' in x]])


enter image description here

New Dev
  • 48,427
  • 12
  • 87
  • 129
sophocles
  • 13,593
  • 3
  • 14
  • 33
  • Possible duplicate: https://stackoverflow.com/questions/38152389 – Bill Huang Oct 23 '20 at 14:31
  • Also possibly duplicate: https://stackoverflow.com/questions/33158417/pandas-combine-two-strings-ignore-nan-values except OP wants to concatenate only over a subset of columns. – jtorca Oct 23 '20 at 15:37

1 Answers1

1

I think this is just str concatenation for a subset of columns.

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {'Part 1 - Body':[np.nan, '100% Other Material'],
     'Part 2 - Back':['43% Nickle', '20% Aluminum'],
     'Part 1 - Lining':['93% Cotton', '23% Spandex']}
    )

df['part_1_total'] = df[[c for c in df.columns if 'Part 1' in c]].apply(
        lambda x: x.str.cat(sep=', '), axis=1)

Resulting dataframe:

         Part 1 - Body Part 2 - Back Part 1 - Lining                      part_1_total
0                  NaN    43% Nickle      93% Cotton                        93% Cotton
1  100% Other Material  20% Aluminum     23% Spandex  100% Other Material, 23% Spandex

You can adjust how you want to concatenate the strings (with a comma, space, etc.) by adjusting the sep parameter. See this answer for more on concatenating string columns in pandas. You could use ''.join in apply but this doesn't seem to work with NaNs.

jtorca
  • 1,531
  • 2
  • 17
  • 31