0

I have code where a function/method accepts a Series (row from df) and is supposed to modify it in-place, such that changes are reflected in the original df. However, I seem unable to force the modification as a view rather than a copy. Information from the documentation and a related question on Stack Overflow do not resolve the issue as given by the example below:

import pandas as pd
pd.__version__ # 0.24.2

ROW_NAME = "r1"
COL_NAME = "B"
NEW_VAL = 100.0

# df I would like to modify in-place
df = pd.DataFrame({"A":[[1], [2], [3,4]], "B": [1.0, 2.0, 3.0]}, index=["r1", "r2", "r3"])

# a row (Series reference) is the input param to a function that should modify df in-place
record = df.loc[ROW_NAME]
record.loc[COL_NAME] = NEW_VAL
assert df.loc[ROW_NAME, COL_NAME] == NEW_VAL #False

The line starting with record.loc results in the familiar warning: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame, which might make sense, except that record appears to reference df and can be modified in-place under some circumstances. An example of this:

record = df.loc[ROW_NAME]
record.loc["A"].append(NEW_VALUE)
assert NEW_VALUE in df.loc["r1", "A"] # True

My question is: how can I force a modification the float value at df.loc[ROW_NAME, COL_NAME] in-place from the Series record? Bonus points for clarifying why it is possible to modify column A in-place but not column B in the examples above.

Other related questions:

anon01
  • 10,618
  • 8
  • 35
  • 58

2 Answers2

1

I think this behavior is confusing because record in this case is a shallow copy of your data frame row.

If you refer to this stack post it sounds like .loc[] is generally expected to return a copy and not a view, and that assignment will not work if the .locs have been chained.

I did confirm if you modify the original data frame directly it will work.

df.loc[ROW_NAME, COL_NAME] = NEW_VAL
assert(df.loc[ROW_NAME, COL_NAME] == NEW_VAL) # True

And as for the .append still working, this is why I mentioned the "shallow" copy behavior. Your new record copy still contains a reference to the original list in column A. See this post for a refresher on the difference between binding to a new object vs mutating an existing object.

  • this is a good start, but I pass the series (`record`) to a function which should modify the df in-place. Is there a way to force `record` to be a view to the original df row? – anon01 Mar 07 '20 at 02:47
  • is there a way you can pass the `ROW_NAME` used to created the series into the function so that you can modify the original dataframe? I don't think pandas provides views. – Lilith Schneider Mar 07 '20 at 03:12
  • 1
    Looking at other references, I think the shallow copy is critical, as you've pointed out. At least one other reference suggests it is not possible to force return a view. It looks like my best option is to change the function signature to take the `df` and `row_name`. – anon01 Mar 07 '20 at 03:17
1

Based on the sources linked in the question and a thorough reading of the documentation, it does not appear possible to enforce returning a view vs copy of a Series generated from a DataFrame row.

As @Lilith Schneider points out, the original confusion over this comes from the fact that record = df.loc["r1"] returns a shallow copy - some hybrid of a copy and view that may cause confusion and lead to unexpected behavior.

anon01
  • 10,618
  • 8
  • 35
  • 58