3

TL;DR: Use .loc[:, 'foo'] instead of .foo


When does pandas assign values to the view and when does it assign values to the copy?

AFAIK, pandas either returns view or copy, depending on the method you use.

You can change the original dataframe if you assign a value to the view but you can't change the original if a value is assigned to the copy.

However, below behavior confuses me. Why is assigning value to a view works with a dataframe but not with a series?

dd = pd.DataFrame([
    {'a': 1, 'b': 2},
    {'a': 2, 'b': 4},
    {'a': 4, 'b': 3},
])

dd[dd.a == 1] = pd.DataFrame([{'a': 100, 'b': 200}]) # Assigning value works. 

dd
>>  a   b
0   100 200
1   2   4
2   4   3

As expected, the value of the first row has been changed.

However, as seen below, assigning a value to a series doens't work, even thought the settings are identical except that I called a series.

dd = pd.DataFrame([
    {'a': 1, 'b': 2},
    {'a': 2, 'b': 4},
    {'a': 4, 'b': 3},
])

dd[dd.a == 1].a = 1000 # Assigning value doesn't work.  

dd
>>  a   b
0   1   2
1   2   4
2   4   3

I'm on pandas 0.19.1 though. (Cause I'm using Python 2.7)

user8491363
  • 2,924
  • 5
  • 19
  • 28
  • Please dont use python 2.7 since its officially dead. Pretty much none of the users who will assist you here can assist you in python 2.7 – Akshay Sehgal Aug 27 '20 at 09:53
  • 1
    IIUC, you need use `.loc` or `iloc` to assign values - i.e `dd.loc[dd['a'] == 1, 'a'] = 1000` which returns new view – Umar.H Aug 27 '20 at 09:56
  • Does this answer your question? [Set value for particular cell in pandas DataFrame using index](https://stackoverflow.com/questions/13842088/set-value-for-particular-cell-in-pandas-dataframe-using-index) – Umar.H Aug 27 '20 at 09:56
  • https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy – Wouter Aug 27 '20 at 09:58
  • @AkshaySehgal Not up to me. Some companies, including mine, have too many legacy codes so they still have to stick to Python 2. It's a lot of pain in a deprecated version of the language but it's something I have to deal with. – user8491363 Aug 27 '20 at 10:41
  • ah, I understand then. There are a lot of new users on SO who end up using 2.7 because they find some old tutorials. But this is something you can't do much about. – Akshay Sehgal Aug 27 '20 at 10:42

1 Answers1

5

TLDR;

I think you have answered your question yourself. Assigning values to a view will behave the way you are expecting while assigning values to a copy will NOT modify the original data frame.

In essence;

  • dd[dd.a == 1].a is a copy of a value from the slice of dataframe
  • dd.a[dd.a == 1] is a view of a slice (by condition) of another slice (df.a) of the same dataframe.

The following is a conditional slice of the dataframe dd and therefore its a view of the dataframe. Similarly the one with loc, which is the recommended way.

dd[dd.a == 1] #Slice of a dataframe based on condition returned as a view

dd.loc[dd.a == 1] #Slice of a dataframe based on condition returned as a view

When you pull a specific column from this slice, you end up getting a copy -

dd[dd.a == 1].a = 100 
#This syntax basically says - 
#From the view of a slice of dd, give me values of 'a' and set that to 100"
#The assignment here is that to a copy and not to a view
   a  b
0  1  2
1  2  4
2  4  3

Therefore assignment will have no effect on the original dataframe.

If you want to do assignment for a specific column then you need to do this -

dd.a[dd.a == 1] = 100
#This syntax basically says - 
#"From the slice, dd.a give me aother slice based on condition and set that slice to 100"
#The assignment happens a sliced view of the dataframe itself
     a  b
0  100  2
1    2  4
2    4  3

Hope that answers your question.

Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51