0

Hi I am trying to add a new column ("A") in an existing data frame based in which the values will be 1 or 3 based on the information in one of the columns ("B")

df["A"] = np.where(df["B"] == "reported-public", 1,3)

When doing so I am getting the warning message:

<ipython-input-239-767754e40f8a>:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Any idea why?

Thanks

Henry
  • 3,472
  • 2
  • 12
  • 36
Max
  • 1

2 Answers2

0

I have made dummy date as follows, to my best abilities based on your limited sample:

import pandas as pd
data = []
data.append([1, "reported-private"])
data.append([2, "reported-private"])
data.append([3, "reported-public"])
df = pd.DataFrame(data, columns=['Number', 'B'])

While using the command provided with numpy 1.19.5 and pandas 1.2.4

df["A"] = np.where(df["B"] == "reported-public", 1,3)

The following output, probably the one your expecting:

Number      B               A
1       reported-private    3
2       reported-private    3
3       reported-public     1

Now the error is hinting that you might want to use .loc from pandas itself, and maybe .apply for extra functionality. Example provided as such:

df['A'] = df.apply(lambda row: 1 if row.B == 'reported-public' else 3, axis = 1)

Output for this way is the same as previous:

Number      B               A
1       reported-private    3
2       reported-private    3
3       reported-public     1

So to sum up, might be a version problem, if it is, try changing the version or try the second approach. Cheers.

You can always disable this behavior, as shown below and is from this post:

import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'
Warkaz
  • 845
  • 6
  • 18
  • Hi thanks I used the second option df['A'] = df.apply(lambda row: 1 if row.B == 'reported-public' else 3, axis = 1) and still got the same message: :1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead. – Max Oct 08 '21 at 09:06
  • The version of numpy is 1.20.1 and pandas 1.2.4 – Max Oct 08 '21 at 09:08
  • @Max I have edited in the answer to disable this kind of behavior for now and linked a relevant post that might help. Check the other post for more detail. Good luck! Might also try ```df.loc[df['A'], 'B'] = new_val``` as suggested in the post. – Warkaz Oct 08 '21 at 13:12
0

Any idea why?

A very simple explanation is that you are slicing the data and trying to assign a value to the slice. Is this slice the same as your original dataframe ? We don't know what Pandas is doing exactly doing underneath. Under some situations it will get assigned into your original dataframe. If it works, then probably it got assigned correctly. That's why it's a warning.

There are some links you get more detailed explanation: How to deal with SettingWithCopyWarning in Pandas

EBDS
  • 1,244
  • 5
  • 16
  • Thanks, yes I am getting the result I need, just wanted to understand the logic behind the warning – Max Oct 08 '21 at 09:08
  • @Max Does this answer your question ? If yes, please accept the answer by clicking the 'tick'. Thanks. – EBDS Oct 08 '21 at 09:47