1

I have a dataframe df_KO with the following columns:

Date, Reg (Registration Number) and a measure of the quality of this particular component (Status).

For a certain use, I want to replace the values of these measurements with either KO or OK depending on a certain threshold with the function:

df_KO.Status = df_KO.apply(lambda row: replace_func(row["Status"]), axis=1)

Where replace_func is:

def replace_func(value):
    if value < 0.25:
        return "OK"
    elif value >= 0.25:
        return "KO"
    else:
        return np.nan

However, I also want to keep the original measurement values, so I have made a copy of the DF. But to my surprise the following statement:

df_KO_raw = df_KO
df_KO_raw["Reg"] = "Dummy"
print(df_KO_raw.head(5))
df_KO.Status = df_KO.apply(lambda row: replace_func(row["Status"]), axis=1)
print(df_KO_raw.head(5))

Gives the following output:

                 Date    Reg      Status
0 2017-02-02 11:52:37  Dummy           0
1 2017-02-04 03:02:57  Dummy           0
2 2017-01-27 12:50:53  Dummy  0.00406572
3 2017-01-29 10:58:50  Dummy  0.00754577
4 2017-01-31 22:43:44  Dummy   0.0037902
                 Date    Reg Status
0 2017-02-02 11:52:37  Dummy     OK
1 2017-02-04 03:02:57  Dummy     OK
2 2017-01-27 12:50:53  Dummy     OK
3 2017-01-29 10:58:50  Dummy     OK
4 2017-01-31 22:43:44  Dummy     OK

Why is this happening? I'm clearly trying to perform an operation on only the df_KO DF, why does it change the values in df_KO_raw as well? Oh it has also changed the values of df_KO.

I'm probably overlooking something obvious, but at this point I can't really see what. Thanks in advance.

Regards

Zeinab Abbasimazar
  • 9,835
  • 23
  • 82
  • 131
jeff
  • 151
  • 7
  • 5
    Problem is with `df_KO_raw = df_KO`, need `df_KO_raw = df_KO.copy()` for new object. – jezrael Nov 22 '17 at 13:07
  • Could you elaborate a bit on why this is? If I would do this: test = 5 test_2 = test test = test*2 This logic does not hold right? – jeff Nov 22 '17 at 13:13
  • What about [this](https://stackoverflow.com/a/13420016/2901002) or [this](https://stackoverflow.com/q/4794244/2901002) – jezrael Nov 22 '17 at 13:16
  • 1
    @Jeff this is just python in general, and not specific only to pandas. when you say `x = y` in python (dataframe or not), they essentially become the same thing (or more accurately, pointers to the same thing). Just play around with this and test with both `x==y` and `x is y` and you'll get the hang of it. If you are coming to python from another language, this may be different than what you are used to but it's basically standard python behavior. – JohnE Nov 22 '17 at 13:44

0 Answers0