1

Morining to all, I have a 460.000 rows DataFrame, with 15 columns. I'm trying to assign to one column the product of another two the code is like this

df[df.colx == 'S']['prd'] = df['col1']*df['col2']

prd, col1 and col2 have float64 as data type. I executed a lot of operations on other columns with no problem, including date difference, and they are almost instantly executed. if I try

df['prd'] =  df['col1']*df['col2']

the execution is super fast. the problem raises when I try to apply the operation on a subset of the DataFrame Someone can help me and explain how I can lower the execution time? Thank you very much!

UPDATE: if if do

df2 = pd.DataFrame(df[df.colx=='S'])

and then

df2['prd'] =  df['col1']*df['col2']

is still super slow......... oh is it possible? df2 should be a new DataFrame.......

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
alex_T
  • 115
  • 1
  • 7
  • does it not work at all, or just takes to much time? – PV8 Jun 11 '19 at 09:13
  • I tried to wait some minutes, but I've always stopped the execution because it's too much time considering that the hole code takes less than 28 seconds... – alex_T Jun 11 '19 at 09:17
  • just split the operations of the subset and the multiplication: `df = df[df.colx == 'S']'` and then `df['prd']= df['col1']*df['col2']` – PV8 Jun 11 '19 at 09:17
  • Thank you PV8! You solution works, but in this way I lose all the != 'S' rows.. – alex_T Jun 11 '19 at 09:21
  • yes, depending on the following lines, you can create a new dataframe and do not touch the original one with `df2 = df[df.colx == 'S']` and use this one then... – PV8 Jun 11 '19 at 09:24
  • With the chaining indices `df[df.colx=='S']['prd'] = ...`, I'm surprised that you didn't get a warning, [details here](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas). Consider using `.loc`. – Quang Hoang Jun 11 '19 at 10:00

1 Answers1

1

Try to seperate the operations:

df2 = df[df.colx == 'S']
df2['prd'] = df2['col1]*df2['col2']

or if the df.colx == 'S'is some condition for you, you can run:

df['prd'] = numpy.where(df['prod'] == 'S', df['col1']*df['col2'], 'Do something else')

just replace Do something else with another logical opartion which should be done if df.colx != 'S'

PV8
  • 5,799
  • 7
  • 43
  • 87