Combine arbitrary number of columns into one in pandas

Question

This question is a general version of a specific case asked about here.

I have a pandas dataframe with columns that contain integers. I'd like to concatenate all of those integers into a string in one column.

Given this answer, for particular columns, this works:

(dl['ungrd_dum'].map(str) +
 dl['mba_dum'].map(str) +
 dl['jd_dum'].map(str) +
 dl['ma_phd_dum'].map(str))

But suppose I have many (hundreds) of such columns, whose names are in a list dummies. I'm certain there's some cool pythonic way of doing this with one magical line that will do it all. I've tried using map with dummies, but haven't yet been able to figure it out.

EdChum · Answer 1 · 2015-09-03T07:59:34.963

3

IIUC you should be able to do

df[dummies].astype(str).apply(lambda x: ''.join(x), axis=1)

Example:

In [12]:

df = pd.DataFrame({'a':np.random.randint(0,100, 5), 'b':np.arange(5), 'c':np.random.randint(0,10,5)})
df
Out[12]:
    a  b  c
0   5  0  2
1  46  1  3
2  86  2  4
3  85  3  9
4  60  4  4
In [15]:

cols=['a','c']
df[cols].astype(str).apply(''.join, axis=1)
Out[15]:
0     52
1    463
2    864
3    859
4    604
dtype: object

EDIT

As @JohnE has pointed out you could call sum instead which will be faster:

df[cols].astype(str).sum(axis=1)

However, that will implicitly convert the dtype to float64 so you'd have to cast back to str again and slice the decimal point off if necessary:

df[cols].astype(str).sum(axis=1).astype(str).str[:-2]

edited Sep 03 '15 at 07:59

answered Sep 02 '15 at 18:20

EdChum

376,765
198
813
562

Indeed, you understand perfectly. Thanks. – itzy Sep 02 '15 at 18:23
1

(Remember that you can use `''.join` instead of `lambda x: ''.join(x)`.) – DSM Sep 02 '15 at 18:45
@DSM indeed I sometimes forget when I don't need a lambda – EdChum Sep 02 '15 at 18:48
@DSM, is that a pandas thing, or due to python syntax? – itzy Sep 02 '15 at 19:24
1

Why not just use `sum` once they are strings? `df.astype(str).sum(axis=1)` – JohnE Sep 03 '15 at 02:57
1

@JohnE although that works the `dtype` is converted to `float64` the OP wanted `str` so you'd have to cast it back to `str` again but it's probably faster for large dfs – EdChum Sep 03 '15 at 08:00
@EdChum Ah, yeah, good point. I was just thinking the syntax was a little simpler is all. – JohnE Sep 03 '15 at 16:56

score 1 · Answer 2 · answered Sep 02 '15 at 18:31

from operator import add
reduce(add, (df[c].astype(str) for c in cols), "")

For example:

df = pd.DataFrame({'a':np.random.randint(0,100, 5), 
                   'b':np.arange(5), 
                   'c':np.random.randint(0,10,5)})

cols = ['a', 'c']


In [19]: df
Out[19]: 
    a  b  c
0   6  0  4
1  59  1  9
2  13  2  5
3  44  3  1
4  79  4  4

In [20]: reduce(add, (df[c].astype(str) for c in cols), "")
Out[20]: 
0     64
1    599
2    135
3    441
4    794
dtype: object

score 1 · Answer 3 · answered May 26 '17 at 21:12

The first thing you need to do is to convert your Dataframe of numbers in a Dataframe of strings, as efficiently as possible:

dl = dl.astype(str)

Then, you're in the same situation as this other question, and can use the same Series.str accessor techniques as in this answer:

`.str.cat()`

Using str.cat() you could do:

dl['result'] = dl[dl.columns[0]].str.cat([dl[c] for c in dl.columns[1:]], sep=' ')

`str.join()`

To use .str.join() you need a series of iterables, say tuples.

df['result'] = df[df.columns[1:]].apply(tuple, axis=1).str.join(' ')

Don't try the above with list instead of tuple or the apply() methdo will return a Dataframe and dataframes don't have the .str accessor like Series.

Cat seems to be faster for me. Lambda was taking around 3 sec for 1 iteration but cat took just 0.08 sec for the same iteration. — Sunil, Feb 06 '19 at 11:36

Combine arbitrary number of columns into one in pandas

3 Answers3

.str.cat()

str.join()

`.str.cat()`

`str.join()`