0

I have a dataset, df, where I would like to combine or sum the values in 2 specific columns to one column

Data

id  name1   name2   status
aa  1       0       y
bb  2       2       n
cc  0       1       y

Desired

id  name    status
aa  1       y
bb  4       n
cc  1       y

Doing

df['name'] = df[['name1', 'name2']].agg(axis = 1)

I am running this script, however, I think I am missing a key syntax. Any suggestion is appreciated.

Lynn
  • 4,292
  • 5
  • 21
  • 44
  • you have not defined any function in agg('np.sum') – Deepak Tripathi Oct 26 '21 at 21:23
  • 1
    Do you not mean `df['name'] = df[['name1', 'name2']].sum(axis=1)`? Or perhaps in this case `df.insert(1, 'name', df.pop('name1') + df.pop('name2'))` is better – Henry Ecker Oct 26 '21 at 21:23
  • yes @HenryEcker I believe the first statement will give desired output _ update: first soln did not not give desired . Will try 2nd soln – Lynn Oct 26 '21 at 21:27
  • If that is the case, do you mind if I close this as a duplicate of [Pandas: sum DataFrame rows for given columns](https://stackoverflow.com/q/25748683/15497888)? – Henry Ecker Oct 26 '21 at 21:28
  • oh I see _ I will try second soln now – Lynn Oct 26 '21 at 21:33
  • 1
    Forewarning that `insert` is one of the few inplace operations in the entire library. Don't do `df = df.insert(...)` you'll end up with `df` being `None`. Just do `df.insert` on a line by itself. – Henry Ecker Oct 26 '21 at 21:34
  • The second soln worked- I did df.insert(1, 'name', df.pop.... so I should do: df.update(1, 'name', df.pop.... ? – Lynn Oct 26 '21 at 21:35
  • No sorry. That was a typo I've fixed it now. My mistake. `insert` is the correct one, just don't assign back by mistake. – Henry Ecker Oct 26 '21 at 21:37
  • 1
    oh no worries this works fine. Can post as dupe thank you – Lynn Oct 26 '21 at 21:40

1 Answers1

1

Simply doing:

df['name'] = df['name1'] + df['name2']
df.drop(['name1','name2'],axis=1, inplace=True)

will work. Pandas columns can be added or multiplied together using standard operators. This also works with columns of string or any column datatype where the operator can be mapped to each row.

If you want to learn the agg functions then you could do.

df['name'] = df[['name1','name2']].sum(axis=1)
df.drop(['name1','name2'],axis=1, inplace=True)

Both will leave df as you want. There's nothing wrong with doing things in two lines. It's often easier to read and comprehend when you're re-using the code.

Emir
  • 373
  • 1
  • 6