I have a data frame which is conditionally broken up into two separate dataframes as follows:
df = pd.read_csv(file, names)
df = df.loc[df['name1'] == common_val]
df1 = df.loc[df['name2'] == target1]
df2 = df.loc[df['name2'] == target2]
# each df has a 'name3' I want to perform a division on after this filtering
The original df is filtered by a value shared by the two dataframes, and then each of the two new dataframes are further filtered by another shared column.
What I want to work:
df1['name3'] = df1['name3']/df2['name3']
However, as many questions have pointed out, this causes a setting with copy warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I tried what was recommended in this question:
df1.loc[:,'name3'] = df1.loc[:,'name3'] / df2.loc[:,'name3']
# also tried:
df1.loc[:,'name3'] = df1.loc[:,'name3'] / df2['name3']
But in both cases I still get weird behavior and the set by copy warning.
I then tried what was recommended in this answer:
df.loc[df['name2']==target1, 'name3'] = df.loc[df['name2']==target1, 'name3']/df.loc[df['name2'] == target2, 'name3']
which still results in the same copy warning.
If possible I would like to avoid copying the data frame to get around this because of the size of these dataframes (and I'm already somewhat wastefully making two almost identical dfs from the original).
If copying is the best way to go with this problem I'm interested to hear why that works over all the options I explored above.
Edit: here is a simple data frame along the lines of what df would look like after the line df.loc[df['name1'] == common_val]
name1 other1 other2 name2 name3
a x y 1 2
a x y 1 4
a x y 2 5
a x y 2 3
So if target1=1
and target2=2
,
I would like df1 to contain only rows where name1=1 and df2 to contain only rows where name2=2, then divide the resulting df1['name3'] by the resulting df2['name3'].
If there is a less convoluted way to do this (without splitting the original df) I'm open to that as well!