1

I have a dataframe as shown below:

df = 
           index     P01  unten   oben     RV   R2_simu
2014-05-23 03:00:00  0.0    0.0    0.9    0.8         0
2014-05-23 06:00:00  0.5    0.7    1.4    0.1         0
2014-05-23 07:00:00  1.0    2.4    2.4    0.6         0
2014-05-23 08:00:00 0.55   15.7   28.0    0.3         0
....

and I try a loop:

for i in range(0, len(df)):

    if df.P01[i] >= df.RV[i]:
        df.R2_simu[i] = 0 

    elif df.P01[i] < df.RV[i]:
        df.R2_simu[i] = df.RV[i]
    else:
        pass

I expect to receive a new dataframe as shown below,

df = 
           index     P01  unten   oben     RV   R2_simu
2014-05-23 03:00:00  0.0    0.0    0.9    0.8       0.8
2014-05-23 06:00:00  0.5    0.7    1.4    0.1         0
2014-05-23 07:00:00  1.0    2.4    2.4    0.6         0
2014-05-23 08:00:00 0.55   15.7   28.0    0.6       0.6

however, I get the message SettingWithCopyWarning, I try to rewrite

 df.R2_simu[i] = df.RV[i]

to

 df.R2_simu[i] = df.RV[i].copy()

But it seems the problem still exists.

Does anyone know how to deal with it? Thanks in advance!

ascripter
  • 5,665
  • 12
  • 45
  • 68
Chi
  • 187
  • 3
  • 14

2 Answers2

2

SettingWithCopyWarning is a common side effect of using syntax like yours:

df.R2_simu[i] = df.RV[i]

The developers recommended using df.loc[] instead of using the index to access elements. Also note that using for i in range(0, len(df)): is less common than using df.iterrows or vectorized functions. For instance, this does the same thing as part of your code:

df['R2_simu'] = df['R2_simu'].apply(lambda row: 0 if row['P01'] >= row['RV']) # it's generally more common to use dict notation in pandas
OR # if you really like dot notation...
df.R2_simu = df.R2_simu.apply(lambda row: 0 if row.P01 >= row.RV)
2

Try setting the values on the dataframe with the loc indexing, this can be the reason that internally a copy of df is created and written to. Change your loop to

for i in range(0, len(df)):

    if df.P01[i] >= df.RV[i]:
        df.loc[i,"R2_simu"] = 0 

    elif df.P01[i] < df.RV[i]:
        df.loc[i,"R2_simu"] = df.RV[i]
    else:
        pass

Even better is you don't use a loop, but vector access:

df.loc[df.loc[:,"P01"] >= df.loc[:,"RV"],"R2_simu"] = 0
df.loc[df.loc[:,"P01"] < df.loc[:,"RV"],"R2_simu"] = df.loc[df.loc[:,"P01"] < df.loc[:,"RV"],"RV"]

Explained from inside to outside

df.loc[:, "col"] => take every row :, and column col

df.loc[x1 >= x2, "R2_simu"]=> Consider only rows where x1 >= x2 and the column R2_simu

ascripter
  • 5,665
  • 12
  • 45
  • 68
  • 1
    You don't need `df.loc[:,"P01"]`; you can just do `df["P01"]`. This can be condensed down to one line with `df['R2_simu']=df['RV'].mul(df['P01'] < df['RV'])`. – Acccumulation Jun 06 '18 at 22:29