Trying to create a new dataframe first spliting the original one in two:
df1 - that contains only rows from original frame which in selected colomn has values from a given list
df2 - that contains only rows from original which in selected colomn has other values, with these values then changed to a new given value.
Return new dataframe as concatenation of df1 and df2
This works fine:
l1 = ['a','b','c','d','a','b']
l2 = [1,2,3,4,5,6]
df = pd.DataFrame({'cat':l1,'val':l2})
print(df)
cat val
0 a 1
1 b 2
2 c 3
3 d 4
4 a 5
5 b 6
df['cat'] = df['cat'].apply(lambda x: 'other')
print(df)
cat val
0 other 1
1 other 2
2 other 3
3 other 4
4 other 5
5 other 6
Yet when I define function:
def create_df(df, select, vals, other):
df1 = df.loc[df[select].isin(vals)]
df2 = df.loc[~df[select].isin(vals)]
df2[select] = df2[select].apply(lambda x: other)
result = pd.concat([df1, df2])
return result
and call it:
df3 = create_df(df, 'cat', ['a','b'], 'xxx')
print(df3)
Which results in what I actually need:
cat val
0 a 1
1 b 2
4 a 5
5 b 6
2 xxx 3
3 xxx 4
And for some reason in this case I get a warning:
..\usr\conda\lib\site-packages\ipykernel\__main__.py:10: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
So how this case (when I assign value to a column in a function) is different from the first one, when I assign value not in a function?
What is the right way to change column value?