3

I want to get a new column by cat two column (float or int) as following shows,

So anyone have a better idea?

I think mine is something too complex

a=pandas.Series([1,3,5,7,9])
b=pandas.Series([2,4,6,8,10])
c=pandas.Series([3,5,6,5,10])

abc=pandas.DataFrame({'a':a, 'b':b, 'c':c})

abc
   a   b   c
0  1   2   3
1  3   4   5
2  5   6   6
3  7   8   5
4  9  10  10

abc['new']=pandas.Series(map(str,abc.iloc[:,0])).str.cat(pandas.Series(map(str,abc.iloc[:,1])), sep='::')

abc
   a   b   c    new
0  1   2   3   1::2
1  3   4   5   3::4
2  5   6   6   5::6
3  7   8   5   7::8
4  9  10  10  9::10
cwind
  • 369
  • 1
  • 7

3 Answers3

3

Use astype for convert to str:

#if need select columns by position with iloc
abc['new'] = abc.iloc[:,0].astype(str) + '::' + abc.iloc[:,1].astype(str)
print (abc)
   a   b   c    new
0  1   2   3   1::2
1  3   4   5   3::4
2  5   6   6   5::6
3  7   8   5   7::8
4  9  10  10  9::10

#if need select by column names
abc['new'] = abc['a'].astype(str) + '::' + abc['b'].astype(str)
print (abc)
   a   b   c    new
0  1   2   3   1::2
1  3   4   5   3::4
2  5   6   6   5::6
3  7   8   5   7::8
4  9  10  10  9::10

Solution with str.cat:

abc['new'] = abc['a'].astype(str).str.cat(abc['b'].astype(str), sep='::')
print (abc)
   a   b   c    new
0  1   2   3   1::2
1  3   4   5   3::4
2  5   6   6   5::6
3  7   8   5   7::8
4  9  10  10  9::10
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
3

You can also do something like this using map

abc['d'] = abc['a'].map(str) +'::'+ abc['b'].map(str)
print(abc)

output:

   a   b   c      d
0  1   2   3   1::2
1  3   4   5   3::4
2  5   6   6   5::6
3  7   8   5   7::8
4  9  10  10  9::10
Rayhane Mama
  • 2,374
  • 11
  • 20
1

how about using apply?

abc['new'] = abc.apply(lambda x: '{}::{}'.format(x['a'],x['b']), axis=1)

it is a simple one-liner this way.

Dimgold
  • 2,748
  • 5
  • 26
  • 49
  • thanks for you answer, but Rayhane Mama may be the most simple solution. – cwind Jul 06 '17 at 07:58
  • yep but note that it makes 3 iterations over the data (``map``, ``map`` and assignment) – Dimgold Jul 06 '17 at 08:04
  • It is very slow, because processes by rows. `astype` and `map` and `sum` are faster, because vectorized functions. – jezrael Jul 06 '17 at 08:08
  • @jezrael which one is slow? As far as I know pandas don't multiprocess – Dimgold Jul 06 '17 at 08:09
  • 1
    apply is slow, maybe help [this](https://stackoverflow.com/questions/24870953/does-iterrows-have-performance-issues/24871316#24871316) - `Jeff` is now developer of pandas. – jezrael Jul 06 '17 at 08:11
  • Thanks, I was always sure that ``apply`` is some-how a vectorized variation of iterations. – Dimgold Jul 06 '17 at 08:13
  • yeah, you idea is good. so how to make this efficiency? abc.iloc[:,i].map(str).str.strip('something'), something i am not showed here... – cwind Jul 06 '17 at 08:14
  • @jezrael but isn't ``map`` is equivalent to ``apply``? – Dimgold Jul 06 '17 at 08:27
  • Hard question for me, but I think map is simplier so faster. But apply is more complicated. Also If use `.apply(axis=1)` it is processes by rows and it is slow too. And for your solution `map` cannot be used, because works only with one column (`Series`) – jezrael Jul 06 '17 at 08:28
  • And also is possible see perfect maxu answer with comparing solutiions - https://stackoverflow.com/a/36911306/2901002 – jezrael Jul 06 '17 at 08:51