Pandas: Join two columns in dataframe (without creating two)

Question

I have the following data:

    Col1 Col2 Col3 Col4
     A    G     K
     B          L   Q
     C    H     M
     D          N   R
     E    I     O   
     F    J     P

This is what I would like:

    Col1 NEW  Col3 
     A    G     K
     B    Q     L   
     C    H     M
     D    R     N   
     E    I     O   
     F    J     P

I've tried using the following solution and it sort of worked the way I needed. What ended up happening was that I would have dataframe A and then dataframe B. B would contain my new column and A would remain unchanged.

II have attempted to use .replace() and .fillna() but I can't seem to get the syntax right. For some reason when I try to do something like:

      csv[Col2] = csv.replace(to_replace=" ", value=csv[Col4]])

I end up having the entire contents of Col4 overwrite Col2 (blank spaces included). This seems to happen with .replace() and .fillna().

I'm not sure if it helps but in these situations Col2 would only be missing data where Col4 had data. Any ideas?

SOLUTION

Thank you to everyone that replied. I realize now that I have been chasing my tail because of a syntactical error. This was another line of code I tried but couldn't make work. I got the "One column overwrites another" issue (see above).

    csv[Col2] = csv.fillna(csv[Col4]])

But after revisiting that page John Galt linked below I felt like I was missing something..Then it hit me like a train.

    csv[Col2] = csv[Col2].fillna(csv[Col4]])

This worked perfectly. And I feel I like paying closer attention may have resolved this a lot sooner. Thank you all for your patients!

UPDATE ONE

I am appending more information about my dataframe in case it ever helps anyone in the future.

    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 50000 entries, 0 to 49999
    Data columns (total 6 columns):
    Col 1                      50000 non-null object
    Col 2                      4652 non-null object
    Col 3                      50000 non-null object
    Col 4                      45347 non-null object
    Col 5                      50000 non-null object
    Col 6                      50000 non-null object
    dtypes: object(6)
    memory usage: 1.5+ MB
    None

Are the blanks empty strings or `nan`? You should be able to do `df['NEW'] = df[['Col2', 'Col4']].sum(axis=1)` — EdChum, Apr 23 '15 at 21:07
Could you replace all the whitespace with blanks: `df['Col2'] = df['Col2'].str.replace(' ','')` and do the same for Col4 and then my previous comment should work — EdChum, Apr 23 '15 at 21:12
My apologies for the delay. @EdChum, I believe they are just blanks. If I export the dataframe to a CSV then cells show up empty. If I index it and then export it to CSV I will see the NAN values. With that being said I tried your solution and ended up getting a "cannot concatenate 'str' and 'int' objects." I feel this may be partially my fault as my example only showed strings. I didn't feel the actual data mattered as I was attempting to fill empty cells with another column. — Corey, Apr 24 '15 at 13:25
Can you post representative data then as this is now an [x y problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) — EdChum, Apr 24 '15 at 13:27
Or do ```csv.info()``` to see types and then ```csv['col'] == ''``` or ```csv['col'] == ' '``` on the string columns to check for blanks or spaces respectively. — JohnE, Apr 24 '15 at 14:15
@EdChum Using JohnE's suggestions I appended my original post with .info(). Both Col2 and Col4 returned false for blanks or spaces. — Corey, Apr 24 '15 at 14:38

JohnE · Accepted Answer · 2015-04-23T21:56:58.230

2

To avoid creating any new temporary variables or dataframes, you could just replace 'Col2' with new values and then drop 'Col4'. There are a few ways to do that.

Numpy where() is a good general solution here that can handle ''(empty or blank), ' '(space), or nan:

df['Col2'] = np.where( df.Col2 == ' ',   df.Col4, df.Col2 )  # space

df['Col2'] = np.where( df.Col2.isnull(), df.Col4, df.Col2 )  # blank or nan

@EdChum's answer should also work and fillna() may be easiest (as in @JohnGalt's link), but won't work if you have a space character there.

df['Col2'] = df['Col2'].fillna(df.Col4)

Afterwards, just drop 'Col4'

df = df.drop('Col4',axis=1)

edited Apr 23 '15 at 21:56

answered Apr 23 '15 at 21:07

JohnE

29,156
8
79
109

1

Marking this as the answer because it consolidates all of the solutions. I fet I owed it to the community to at least test all three solutions. I was able to get the desired result with the Numpy solution as well as .fillna() (after using the right syntax). I feel as though EdChum's solution would work with a different data set. – Corey Apr 24 '15 at 14:00

Pandas: Join two columns in dataframe (without creating two)

1 Answers1