Pandas - combine two columns

Question

I have 2 columns, which we'll call x and y. I want to create a new column called xy:

x    y    xy
1         1
2         2

     4    4
     8    8

There shouldn't be any conflicting values, but if there are, y takes precedence. If it makes the solution easier, you can assume that x will always be NaN where y has a value.

score 4 · Accepted Answer · answered Aug 01 '18 at 14:01

4

it could be quite simple if your example is accurate

df.fillna(0)      #if the blanks are nan will need this line first
df['xy']=df['x']+df['y']

answered Aug 01 '18 at 14:01

SuperStew

2,857
2
15
27

2

Or `df.x.combine_first(df.y)` – Jon Clements Aug 01 '18 at 14:04
1

Or that will work too. Pandas is like skinning a cat – SuperStew Aug 01 '18 at 14:04
Awesome, this worked. However, looking at the combine_first example, shouldn't it be df.y.combine_first(df.x) if you want y to take precedence (in case they both have values)? – JesusMonroe Aug 01 '18 at 14:08
If the blank are blank , by using your code I received `TypeError: unsupported operand type(s) for +: 'float' and 'str' ` – BENY Aug 01 '18 at 14:10
@JesusMonroe yes... put it in order of precedence... it was just an example :) – Jon Clements Aug 01 '18 at 14:13

BENY · Answer 2 · 2018-08-01T14:14:32.703

3

Notice your column type right now is string not numeric anymore

df = df.apply(lambda x : pd.to_numeric(x, errors='coerce'))

df['xy'] = df.sum(1)

More

df['xy'] =df[['x','y']].astype(str).apply(''.join,1)

#df[['x','y']].astype(str).apply(''.join,1)
Out[655]: 
0    1.0
1    2.0
2       
3    4.0
4    8.0
dtype: object

edited Aug 01 '18 at 14:14

answered Aug 01 '18 at 14:05

BENY

317,841
20
164
234

You don't need the `lambda` here: you can write it as `df.apply(pd.to_numeric, errors='coerce')` – Jon Clements Aug 01 '18 at 14:14

score 0 · Answer 3 · answered Aug 01 '18 at 14:18

You can also use NumPy:

import pandas as pd, numpy as np

df = pd.DataFrame({'x': [1, 2, np.nan, np.nan],
                   'y': [np.nan, np.nan, 4, 8]})

arr = df.values
df['xy'] = arr[~np.isnan(arr)].astype(int)

print(df)

     x    y  xy
0  1.0  NaN   1
1  2.0  NaN   2
2  NaN  4.0   4
3  NaN  8.0   8

Pandas - combine two columns

3 Answers3

Linked

Related