10

Given the following:

df = pd.DataFrame({'col1' : ["a","b"],
            'col2'  : ["ab",np.nan], 'col3' : ["w","e"]})

I would like to be able to create a column that joins the content of all three columns into one string, separated by the character "*" while ignoring NaN.

so that I would get something like that for example:

a*ab*w
b*e

Any ideas?

Just realised there were a few additional requirements, I needed the method to work with ints and floats and also to be able to deal with special characters (e.g., letters of Spanish alphabet).

Bastien
  • 362
  • 2
  • 7
  • 21
  • I test my answer with this: `df = pd.DataFrame({'col1' : ["a","b",3,'ñ'], 'col2' : ["ab",np.nan, 4,'ñ'], 'col3' : ["w","e", 6,'ñ']})` and it still worked – EdChum May 01 '15 at 09:23

6 Answers6

15
In [68]:

df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().values.tolist()), axis=1)
df
Out[68]:
  col1 col2 col3 new_col
0    a   ab    w  a*ab*w
1    b  NaN    e     b*e

UPDATE

If you have ints or float you can convert these to str first:

In [74]:

df = pd.DataFrame({'col1' : ["a","b",3],
            'col2'  : ["ab",np.nan, 4], 'col3' : ["w","e", 6]})
df
Out[74]:
  col1 col2 col3
0    a   ab    w
1    b  NaN    e
2    3    4    6
In [76]:

df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
df
Out[76]:
  col1 col2 col3 new_col
0    a   ab    w  a*ab*w
1    b  NaN    e     b*e
2    3    4    6   3*4*6

Another update

In [81]:

df = pd.DataFrame({'col1' : ["a","b",3,'ñ'],
            'col2'  : ["ab",np.nan, 4,'ü'], 'col3' : ["w","e", 6,'á']})
df
Out[81]:
  col1 col2 col3
0    a   ab    w
1    b  NaN    e
2    3    4    6
3    ñ    ü    á

In [82]:

df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
​
df
Out[82]:
  col1 col2 col3 new_col
0    a   ab    w  a*ab*w
1    b  NaN    e     b*e
2    3    4    6   3*4*6
3    ñ    ü    á   ñ*ü*á

My code still works with Spanish characters

EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Great seems to work perfectly. One issue, if I have a integers in the mix, python isn't happy. Any idea how to convert things to a string on the fly maybe? – Bastien May 01 '15 at 09:12
  • Thanks! Damn I also have annoying characters (letters of spanish so I get the following error): `'ascii' codec can't encode character u'\xf3' in position 6: ordinal not in range(128)` – Bastien May 01 '15 at 09:16
  • 1
    try to solve that last error and if you can't, ask another question, as it is unrelated to your initial question – Julien Spronck May 01 '15 at 10:00
3
In [1556]: df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
Out[1556]: 
0    a*ab*w
1       b*e
2     3*4*�
3     ñ*ü*á
dtype: object
fixxxer
  • 15,568
  • 15
  • 58
  • 76
  • Oops didn't see this answer was the first. Great I should have thought of using apply. I ended up having a few issues with integers and special character (like spanish letters). The answer bellow solves my issue with integers but waiting for an answer on how to deal with special characters like `u'\xf3'`. – Bastien May 01 '15 at 09:14
  • @Bastien my answer still works for ``'\xf3'` but I'm running python 3 – EdChum May 01 '15 at 10:49
2

You can use dropna()

df['col4'] = df.apply(lambda row: '*'.join(row.dropna()), axis=1)

UPDATE:

Since, you need to convert numbers and special chars too, you can use astype(unicode)

In [37]: df = pd.DataFrame({'col1': ["a", "b"], 'col2': ["ab", np.nan], "col3": [3, u'\xf3']})

In [38]: df.apply(lambda row: '*'.join(row.dropna().astype(unicode)), axis=1)
Out[38]: 
0    a*ab*3
1       b*ó
dtype: object

In [39]: df['col4'] = df.apply(lambda row: '*'.join(row.dropna().astype(unicode)), axis=1)

In [40]: df
Out[40]: 
  col1 col2 col3    col4
0    a   ab    3  a*ab*3
1    b  NaN    ó     b*ó
Anish Shah
  • 7,669
  • 8
  • 29
  • 40
2

using pandas.Series.str.cat function:

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(col1=["a","b"],
                       col2=["ab",np.nan], 
                       col3=["w","e"]))
df.T.apply(lambda c: c.str.cat(sep='*'))

will give

0    a*ab*w
1       b*e
dtype: object

if you have mix of int, float and string, you can use:

df.astype(str).T.apply(lambda c: c.replace('nan', np.nan).str.cat(sep='*'))
lisrael1
  • 348
  • 2
  • 7
1
df.apply(lambda row: '*'.join(row.dropna()), axis=1)
Zah
  • 6,394
  • 1
  • 22
  • 34
1
for row in xrange(len(df)):
    s = '*'.join(df.ix[row].dropna().tolist())
    print s
Julien Spronck
  • 15,069
  • 4
  • 47
  • 55