1

Suppose I have a dataframe like this, with a "dense" first column and a "sparse" second column:

# python 3.7.1, pandas 0.23.4.

import pandas as pd
df = pd.DataFrame({'col1':range(1,5), 'col2': [5, '', 7, '']})

missing_values_index = df[df['col2'] == ''].index

I tried two methods to assign col1 values to col2 missing values.

Method 1 (does not work, df remains unchanged):

df.loc[missing_values_index]['col2'] = df.loc[missing_values_index]['col1']



Method 2 (works ok):

df.loc[missing_values_index, 'col2'] = df.loc[missing_values_index, 'col1']



I thought these were just two ways of writing the same thing. Can someone explain what is really happening here?

Victor Valente
  • 761
  • 9
  • 24
  • 7
    [This answer](https://stackoverflow.com/a/53954986/4909087) explains the difference and its consequences in gory detail. – cs95 Mar 13 '19 at 19:27
  • 4
    Also, read the docs (https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html?highlight=chained#why-does-assignment-fail-when-using-chained-indexing) on chained assignments. – Scott Boston Mar 13 '19 at 19:30
  • when you print it show the same , when you assign it , it is not – BENY Mar 13 '19 at 19:33
  • Hi, did my answer solved your problem? if so, feel free to accept it. – Amir Shabani May 17 '19 at 12:50

1 Answers1

0

The second method you mentioned "works ok", so let's talk about why the first method doesn't work!

I think the core of the problem is when we try to assign a value to a copy of an object, instead of the object itself. I can rewrite your first method like this:

something = df.loc[missing_values_index]
something['col2'] = df.loc[missing_values_index]['col1']

Now here is when the problem manifests itself. According to the documentation, it isn't known that the first line creates a view or a copy. That's why it throws a SettingWithCopy warning and prevents you from assigning; Because if Pandas can't be sure that your assignment works, it can't let you do it!

Amir Shabani
  • 3,857
  • 6
  • 30
  • 67