1

This question is a general version of a specific case asked about here.

I have a pandas dataframe with columns that contain integers. I'd like to concatenate all of those integers into a string in one column.

Given this answer, for particular columns, this works:

(dl['ungrd_dum'].map(str) +
 dl['mba_dum'].map(str) +
 dl['jd_dum'].map(str) +
 dl['ma_phd_dum'].map(str))

But suppose I have many (hundreds) of such columns, whose names are in a list dummies. I'm certain there's some cool pythonic way of doing this with one magical line that will do it all. I've tried using map with dummies, but haven't yet been able to figure it out.

Community
  • 1
  • 1
itzy
  • 11,275
  • 15
  • 63
  • 96

3 Answers3

3

IIUC you should be able to do

df[dummies].astype(str).apply(lambda x: ''.join(x), axis=1)

Example:

In [12]:

df = pd.DataFrame({'a':np.random.randint(0,100, 5), 'b':np.arange(5), 'c':np.random.randint(0,10,5)})
df
Out[12]:
    a  b  c
0   5  0  2
1  46  1  3
2  86  2  4
3  85  3  9
4  60  4  4
In [15]:

cols=['a','c']
df[cols].astype(str).apply(''.join, axis=1)
Out[15]:
0     52
1    463
2    864
3    859
4    604
dtype: object

EDIT

As @JohnE has pointed out you could call sum instead which will be faster:

df[cols].astype(str).sum(axis=1)

However, that will implicitly convert the dtype to float64 so you'd have to cast back to str again and slice the decimal point off if necessary:

df[cols].astype(str).sum(axis=1).astype(str).str[:-2]
EdChum
  • 376,765
  • 198
  • 813
  • 562
1
from operator import add
reduce(add, (df[c].astype(str) for c in cols), "")

For example:

df = pd.DataFrame({'a':np.random.randint(0,100, 5), 
                   'b':np.arange(5), 
                   'c':np.random.randint(0,10,5)})

cols = ['a', 'c']


In [19]: df
Out[19]: 
    a  b  c
0   6  0  4
1  59  1  9
2  13  2  5
3  44  3  1
4  79  4  4

In [20]: reduce(add, (df[c].astype(str) for c in cols), "")
Out[20]: 
0     64
1    599
2    135
3    441
4    794
dtype: object
ely
  • 74,674
  • 34
  • 147
  • 228
1

The first thing you need to do is to convert your Dataframe of numbers in a Dataframe of strings, as efficiently as possible:

dl = dl.astype(str)

Then, you're in the same situation as this other question, and can use the same Series.str accessor techniques as in this answer:

.str.cat()

Using str.cat() you could do:

dl['result'] = dl[dl.columns[0]].str.cat([dl[c] for c in dl.columns[1:]], sep=' ')

str.join()

To use .str.join() you need a series of iterables, say tuples.

df['result'] = df[df.columns[1:]].apply(tuple, axis=1).str.join(' ')

Don't try the above with list instead of tuple or the apply() methdo will return a Dataframe and dataframes don't have the .str accessor like Series.

LeoRochael
  • 14,191
  • 6
  • 32
  • 38
  • Cat seems to be faster for me. Lambda was taking around 3 sec for 1 iteration but cat took just 0.08 sec for the same iteration. – Sunil Feb 06 '19 at 11:36