I have a dataframe df_KO
with the following columns:
Date
, Reg
(Registration Number) and a measure of the quality of this particular component (Status
).
For a certain use, I want to replace the values of these measurements with either KO or OK depending on a certain threshold with the function:
df_KO.Status = df_KO.apply(lambda row: replace_func(row["Status"]), axis=1)
Where replace_func is:
def replace_func(value):
if value < 0.25:
return "OK"
elif value >= 0.25:
return "KO"
else:
return np.nan
However, I also want to keep the original measurement values, so I have made a copy of the DF. But to my surprise the following statement:
df_KO_raw = df_KO
df_KO_raw["Reg"] = "Dummy"
print(df_KO_raw.head(5))
df_KO.Status = df_KO.apply(lambda row: replace_func(row["Status"]), axis=1)
print(df_KO_raw.head(5))
Gives the following output:
Date Reg Status
0 2017-02-02 11:52:37 Dummy 0
1 2017-02-04 03:02:57 Dummy 0
2 2017-01-27 12:50:53 Dummy 0.00406572
3 2017-01-29 10:58:50 Dummy 0.00754577
4 2017-01-31 22:43:44 Dummy 0.0037902
Date Reg Status
0 2017-02-02 11:52:37 Dummy OK
1 2017-02-04 03:02:57 Dummy OK
2 2017-01-27 12:50:53 Dummy OK
3 2017-01-29 10:58:50 Dummy OK
4 2017-01-31 22:43:44 Dummy OK
Why is this happening? I'm clearly trying to perform an operation on only the df_KO DF
, why does it change the values in df_KO_raw
as well? Oh it has also changed the values of df_KO
.
I'm probably overlooking something obvious, but at this point I can't really see what. Thanks in advance.
Regards