1

I have a data frame which is conditionally broken up into two separate dataframes as follows:

df = pd.read_csv(file, names)
df = df.loc[df['name1'] == common_val]
df1 = df.loc[df['name2'] == target1]
df2 = df.loc[df['name2'] == target2]
# each df has a 'name3' I want to perform a division on after this filtering

The original df is filtered by a value shared by the two dataframes, and then each of the two new dataframes are further filtered by another shared column.

What I want to work:

df1['name3'] = df1['name3']/df2['name3']

However, as many questions have pointed out, this causes a setting with copy warning:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I tried what was recommended in this question:

df1.loc[:,'name3'] = df1.loc[:,'name3'] / df2.loc[:,'name3']
# also tried:
df1.loc[:,'name3'] = df1.loc[:,'name3'] / df2['name3']

But in both cases I still get weird behavior and the set by copy warning.

I then tried what was recommended in this answer:

df.loc[df['name2']==target1, 'name3'] = df.loc[df['name2']==target1, 'name3']/df.loc[df['name2'] == target2, 'name3']

which still results in the same copy warning.

If possible I would like to avoid copying the data frame to get around this because of the size of these dataframes (and I'm already somewhat wastefully making two almost identical dfs from the original).

If copying is the best way to go with this problem I'm interested to hear why that works over all the options I explored above.

Edit: here is a simple data frame along the lines of what df would look like after the line df.loc[df['name1'] == common_val]

name1 other1      other2    name2     name3 
a     x           y         1         2 
a     x           y         1         4
a     x           y         2         5
a     x           y         2         3

So if target1=1 and target2=2, I would like df1 to contain only rows where name1=1 and df2 to contain only rows where name2=2, then divide the resulting df1['name3'] by the resulting df2['name3'].

If there is a less convoluted way to do this (without splitting the original df) I'm open to that as well!

M-Wi
  • 392
  • 2
  • 11
  • Use `df = df.loc[df['name1'] == common_val].copy() df1 = df.loc[df['name2'] == target1].copy() df2 = df.loc[df['name2'] == target2].copy()` and then `df1['name3'] = df1['name3']/df2['name3'].values` if same lengths of both DataFrames – jezrael May 11 '20 at 14:23
  • If not same lengths `df1 = df.loc[df['name2'] == target1].reset_index(drop=True).copy() df2 = df.loc[df['name2'] == target2].reset_index(drop=True).copy()` should help and then `df1['name3'] = df1['name3']/df2['name3']` – jezrael May 11 '20 at 14:25
  • why is this marked as duplicate when I say right in my question I tried what was suggested in that question and it didn't work? – M-Wi May 11 '20 at 14:25
  • hmmm, there is not used `copy` in your solutions, or I miss it? – jezrael May 11 '20 at 14:26
  • I can edit my question to reflect this, but I was hoping to avoid copy for performance reasons. But that doesn't explain why it doesn't work with the options recommended in the questions I referenced (and the docs) – M-Wi May 11 '20 at 14:27
  • So you try `.copy()` solutions? And not working? Is possible add your code with copy to question? – jezrael May 11 '20 at 14:29
  • OK, so reopened, because now is clear no copy solution need. – jezrael May 11 '20 at 14:34
  • Please [provide a reproducible copy of the DataFrame with `df.to_clipboard(sep=',')`](https://stackoverflow.com/questions/52413246/how-to-provide-a-copy-of-your-dataframe-with-to-clipboard) or a reproducible dataframe: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Trenton McKinney May 11 '20 at 15:51
  • I edited the post to contain a simple example df which satisfies what I'm after, as well as more specific details. – M-Wi May 11 '20 at 18:54

0 Answers0