I have the following problem: in a df, I want to select specific rows and a specific column and in this selection take the first n
elements and assign a new value to them. Naively, I thought that the following code should do the job:
import seaborn as sns
import pandas as pd
df = sns.load_dataset('tips')
df.loc[df.day=="Sun", "smoker"].iloc[:4] = "Yes"
Both of the loc
and iloc
should return a view into the df and the value should be overwritten. However, the dataframe does not change. Why?
I know how to go around it -- creating a new df first just with the loc
, then changing the value using iloc
and updating back the original df (as below).
But a) I do not think it's optimal, and b) I would like to know why the top solution does not work. Why does it return a copy and not a view of a view?
The alternative solution:
df = sns.load_dataset('tips')
tmp = df.loc[df.day=="Sun", "smoker"]
tmp.iloc[:4] = "Yes"
df.loc[df.day=="Sun", "smoker"] = tmp
Note: I have read the docs, this really great post and this question but they don't explain this. Their concern is the difference between df.loc[mask,"z]
and the chained df["z"][mask]
.