0
  os  hour  day
0  13    14    0
1  19    14    0
2  13    14    0
3  13    14    0
4  13    14    0

Here is my dataframe and I just want to get a new column which is str(os)+'_'+str(hour)+'_'str(day). I use apply function to process the dataframe but it is very slow. Any high-performance method to realize this ? I also tried convert the df to array and process every row. It seems that it is slow too. There are nearly two hundred millions rows of the dataframe.

yanachen
  • 3,401
  • 8
  • 32
  • 64
  • Have you tried just do `df['new'] = df['os'].astype(str) + '_' +df['hour'].astype(str) + '_' + df['day'].astype(str)`? – hellpanderr Apr 02 '18 at 03:54
  • Well, I tried this but it is also very slow. – yanachen Apr 02 '18 at 04:41
  • 1
    You can check [this](https://stackoverflow.com/q/19377969/2901002), I think fastest is `df['new'] = df['os'].map(str) + '_' +df['hour'].map(str) + '_' + df['day'].map(str)` if no `NaN`s – jezrael Apr 02 '18 at 05:31

1 Answers1

1

Not sure what code are you using but you can try

df.astype(str).apply('_'.join, axis = 1)

0    13_14_0
1    19_14_0
2    13_14_0
3    13_14_0
4    13_14_0
Vaishali
  • 37,545
  • 5
  • 58
  • 86