3

I am trying to concatenating two columns in pandas DataFrame. The problem is when there is None value exist in either series, the result is NaN. Since the real data is very large and there is value to keep original None values for later reference, I hope not to change the original value in the columns. Is there a way to achieve this in pandas?

To create an example DataFrame:

import pandas as pd
f = pd.DataFrame([['a', 'b','c','a', 'b','c'],['1', '2','3', '4', '5','6', ]])
f = f.transpose()
f.columns = ['xx', 'yy']
f.xx[0] = None
f.yy[0] = None
f.xx[2] = None
f.yy[3] = None

    xx      yy
0   None    None
1   b       2
2   None    3
3   a       None
4   b       5
5   c       6

I tried f['new_str'] = f.xx + f.yy and f['new_str'] = f['xx'] + f['yy']. Both set the concatenated value to NaN if any of the value is None type. I think this is due to how pandas handle None type. The None type and str type is not "addable" by the '+' operator.

    xx      yy      new_str
0   None    None    NaN
1   b       2       b2
2   None    3       NaN
3   a       None    NaN
4   b       5       b5
5   c       6       c6

Here is what I want to do:

f['new_str'] = f.xx.map(lambda x: '')
for idx, arow in f.iterrows():
    con = ''
    if arow.xx:
        con += arow.xx
    if arow.yy:
        con += arow.yy
    f.loc[idx,'new_str'] = con
f
    xx      yy      new_str
0   None    None    
1   b       2       b2
2   None    3       3
3   a       None    a
4   b       5       b5
5   c       6       c6  

My question is that does pandas support a more elegant/simple way to achieve this?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Bin
  • 3,645
  • 10
  • 33
  • 57

1 Answers1

6

Call fillna on each column to set the Nones to '', which is the identity element under string concatenation.

f['new_str'] = f.xx.fillna('') + f.yy.fillna('')

This gives a new column formatted the way you wanted:

>>> f
     xx    yy new_str
0  None  None        
1     b     2      b2
2  None     3       3
3     a  None       a
4     b     5      b5
5     c     6      c6
pneumatics
  • 2,836
  • 1
  • 27
  • 27
  • 2
    Same method but not as clean: `f.apply(lambda row: (row['xx'] or '') + (row['yy'] or ''),axis=1)` – Liam Foley Dec 16 '15 at 01:09
  • @pneumatics Thanks. In the original data, the original field need not to be changed, because 'None' and empty string convey different meaning. Is there a way do do this without change the original column? Thanks. – Bin Dec 16 '15 at 01:12
  • 1
    @Bin try it out, this won't change the values in the `xx` or `yy` columns because they aren't assigned on the left-hand side. `fillna` works on a copy of the data. – pneumatics Dec 16 '15 at 01:15
  • @Liam Foley Thanks. I just tried it out. Your code have the desired behavior. The (row['xx'] or '') + (row['yy'] or '') trick is really interesting, and I have not seen before. Can you explain it in a answer? Like how the () + () structure works and how the row['xx'] or '' works? Thanks. – Bin Dec 16 '15 at 01:21
  • 2
    FYI, Liam Foley's solution is 1.54x faster on my machine. It does not create any large intermediate objects either, so the memory usage is better as well. – Martin Valgur Dec 16 '15 at 01:31
  • @Martin Valgur Thanks for testing out the speed. – Bin Dec 16 '15 at 01:32
  • @pneumatics Right. I tried your solution out. It also does not change the original value. So the solution is great!. Thanks. – Bin Dec 16 '15 at 02:27
  • @bin It's called a ternary operator http://stackoverflow.com/questions/394809/does-python-have-a-ternary-conditional-operator it takes the first value that is 'Truthy', sometimes called coalesce. – Liam Foley Dec 16 '15 at 03:05
  • @Bin Also see this http://stackoverflow.com/questions/16353729/pandas-how-to-use-apply-function-to-multiple-columns – Liam Foley Dec 16 '15 at 03:05
  • @Liam Foley Cool! I went to the link and now I understand the short cut. ( row['xx'] or '') is a short format for (row['xx'] if row['xx'] or ''), a sort format for the cases when the conditional test and one of the value is the same. Thanks! – Bin Dec 16 '15 at 05:39