1

I have two dataframes with two columns each:

df1:

    C1  C2
0    x   a
1    y   b
2    z   c

df2:

    C1  C2
0    q   s
1    r   u
2    t   v

I want to make a third column that concatenates both columns. I want to make a third dataframe such that:

d3:

    C1  C2
0    q  as
1    r  bu
2    t  cv

To do this I have used: d3['C2'] = d1['C2'] + d2['C2']. This seems to work with one of my columns, as well as with some dummy data I've created. However, for some other columns (which has the exact same data), it doesn't seem to work. Instead I d2['C2'] seems to overwrite d3['C2'] and all I see is d2['C2'] data in that column.

I tried something like:

df.apply(lambda x:'%s is %s' % (x['bar'],x['foo']),axis=1)

However, not only did it take extremely long (I have hundreds of thousands of rows in my data) but it didn't seem to work.

What am I doing wrong? Why would that method work for one column, but not the other?

pyman
  • 565
  • 2
  • 7
  • 15
  • it is quite hard to see what's wrong without and example of that, can you reproduce the problem in an example? – HVNSweeting Oct 26 '17 at 04:59
  • If you're referring to the lambda code, it just returned something that made no sense. – pyman Oct 26 '17 at 05:32
  • Apologies HVNSweeting, I'm trying to find a way to recreate the issue with example data, but it only seems to occur with this particular dataset I'm dealing with. – pyman Oct 26 '17 at 05:52

1 Answers1

1

You have many different methods for doing this, I took the fastest method from the answers here and tried out this exmaple, seems to work fine.

I would think the only problem with your d3['C2'] = d1['C2'] + d2['C2'] is that cometimes the data is not of type string, so you need to coerce the data to the proper type with the .astype(str) function.

A = pd.DataFrame({'C1':['x','y','z'],'C2':['a','b','c']})
B = pd.DataFrame({'C1':['q','r','t'],'C2':['s','u','v']})
C = B.copy()
C['C2'] = A['C2'].astype(str)+C['C2'].astype(str)
print C
  C1  C2
0  x  as
1  y  bu
2  z  cv
jeffery_the_wind
  • 17,048
  • 34
  • 98
  • 160
  • d3 actually is made by copying d2 using d3 = d2.copy(). I actually thought that that may be the issue (ie. I'm using the wrong copy somehow) – pyman Oct 26 '17 at 05:32
  • Ah yes that works just as well, I have updated my answer. – jeffery_the_wind Oct 26 '17 at 05:36
  • 1
    Sorry, that reply was meant for another response. I think you're right in that the data type might not be a string (strangely). I'm trying to re-run my script, but this time, forcing the data in those columns to be strings by using str(x). – pyman Oct 26 '17 at 05:54
  • 1
    You were correct jeffery_the_wind. It seems to be because my script wasn't reading the data in those columns as strings. I didn't use the astype(str) method, because it seemed to turn empty cells into 'nan' strings. Instead, I changed my script to specify that the data inputted in there always had to be string by using str(input) – pyman Oct 26 '17 at 06:26
  • @pyman OK that's good. Depending on what context you are working in, you may not always have that much control. In the case you wanted to replace all the NaN fields after the dataframe is created, you could use the `.fillna()` function, so that line would read like: `C['C2'] = A['C2'].fillna('').astype(str)+C['C2'].fillna('').astype(str)` – jeffery_the_wind Oct 26 '17 at 13:27