updating sub-set dataframe to update parent dataframe

Question

I have a 4x4 dataframe (df). I created two child dataframes (4x1), (4x2). And updated both. In first case, the parent is updated, in second, it is not. How to ensure that the parent dataframe is updated when child dataframe is updated?

I have a 4x4 dataframe (df). From this as a parent, I created two child dataframes - dfA with single column (4x1) and dfB with two columns (4x2). I have NaN values in both subsets. Now, when I use fillna on both, in respective dfA and dfB, i can see the NaN values updated with given value. Fine upto now. However, now when I check the Parent Dataframe, in First case (4x1), the updated value reflects whereas in Second case (4x2), it does not. Why it is so. And What should I do to let the changes in child dataframe reflect in the parent dataframe.

studentnames = ['Maths','English','Soc.Sci', 'Hindi', 'Science']
semisteronemarks = [15, 50, np.NaN, 50, np.NaN]
semistertwomarks = [25, 53, 45, 45, 54]
semisterthreemarks = [20, 50, 45, 15, 38]
semisterfourmarks = [26, 33, np.NaN, 35, 34]
semisters = ['Rakesh','Rohit', 'Sam', 'Sunil']
df1 = pd.DataFrame([semisteronemarks,semistertwomarks,semisterthreemarks,semisterfourmarks],semisters, studentnames)

# case 1
dfA = df['Soc.Sci']
dfA.fillna(value = 98, inplace = True)
print(dfA)
print(df)

# case 2
dfB = df[['Soc.Sci', 'Science']]
dfB.fillna(value = 99, inplace = True)
print(dfB)
print(df)
'''

## contents of parent df ->>
## Actual Output -
# case 1
               Maths    English Soc.Sci Hindi   Science
      Rakesh    15        50      98.0   50      NaN
      Rohit     25        53      45.0   45      54.0
      Sam       20        50      45.0   15      38.0
      Sunil     26        33      98.0   35      34.0

# case 2
               Maths    English Soc.Sci Hindi   Science
       Rakesh   15        50      NaN    50      NaN
       Rohit    25        53      45.0   45      54.0
       Sam      20        50      45.0   15      38.0
       Sunil    26        33      NaN    35      34.0


## Expected Output -
# case 1
               Maths    English Soc.Sci Hindi   Science
        Rakesh  15        50      98.0   50      NaN
        Rohit   25        53      45.0   45      54.0
        Sam     20        50      45.0   15      38.0
        Sunil   26        33      98.0   35      34.0

# case 2
              Maths     English Soc.Sci Hindi   Science
        Rakesh  15        50      99.0   50      NaN
        Rohit   25        53      45.0   45      54.0
        Sam     20        50      45.0   15      38.0
        Sunil   26        33      99.0   35      34.0

# note the difference in output for column Soc.Sci in case 2.

I can't reproduce the issue, but it maybe because we have different pandas versions. Your code should give you the `SettingwithCopyWarning` though. See [here](https://www.dataquest.io/blog/settingwithcopywarning/) and [here](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) on how to deal with it. — Georgy, May 02 '19 at 10:48
Why not make the changes in the parent df itself and then create child df? — Sid, May 02 '19 at 11:36

score 0 · Answer 1 · answered May 02 '19 at 12:00

In your code df1 is defined df is not.

With the approach being used

# case 1
dfA = df1['Soc.Sci']   # changed df to df1
dfA.fillna(value = 98, inplace = True)

df1['Soc.Sci'] = dfA  # Because dfA is not a dataframe but a series
# if you want to do
df1['Soc.Sci'] = dfA['Soc.Sci']  
# you will need to change the dfA
dfA = df1[['Soc.Sci']]  # this makes it a dataframe


# case 2
dfB = df1[['Soc.Sci', 'Science']] # changed df to df1
dfB.fillna(value = 99, inplace = True)

df1[['Soc.Sci','Science']] = dfB[['Soc.Sci','Science']]

print(df1)

I would suggest just using the fillna in the parent df.

df1['Soc.Sci'].fillna(value=99,inplace=True)

score 0 · Answer 2 · answered May 02 '19 at 12:57

You should have seen a warning:

Warning (from warnings module):
...
SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

It means that dfB may be a copy instead of a view. And according to the results it is. There is little that can be done here, and specifically you cannot force pandas to generate a view. The choice depends of parameters only known to pandas and its developpers.

But it is always possible to assign to the columns of the parent DataFrame:

# case 2
df = pd.DataFrame([semisteronemarks,semistertwomarks,semisterthreemarks,semisterfourmarks],semisters, studentnames)
df[['Soc.Sci', 'Science']] = df[['Soc.Sci', 'Science']].fillna(value = 99)
print(df)

updating sub-set dataframe to update parent dataframe

2 Answers2