1

Given this dataframe

df = pd.DataFrame({'x': range(10,51,10),
                   'y': [False]*5})
print(df)
--------
    x      y
0  10  False
1  20  False
2  30  False
3  40  False
4  50  False

Is there a way to query that dataframe on x and force pandas to return a view that I can modify sometime in the future?

view = df.loc[df.x <= 20]
print(view._is_view) # returns False
# ... life goes by for a while
view.y = True # does not modify original df

I know I could do this

df.loc[df.x <=20, 'y'] = True

but in my case, the query and the assignment need to be separated by time and code space. My current workaround is to grab the indexes from the query, and just modify the original dataframe instead of messing with the view.

Note I have omitted this for simplicity, but in my actual app, I need to assign each row of the view one by one, separated by time. The view would be slick if I could get it to work.

bigh_29
  • 2,529
  • 26
  • 22

1 Answers1

1

The pandas documentation currently provides very little guidance in that respect. I could not find a documented list of situations in which a view is guaranteed to be returned, e.g. there are no guarantees that .loc returns a view, as your example demonstrates.

From my understanding, two of the conditions that make it more likely for a view to be returned when using .loc are:

In your specific case that would mean changing df.loc[df.x <= 20] to df.loc[0:1,'y'], as follows:

df = pd.DataFrame({'x': range(10,51,10),
                   'y': [False]*5})
view = df.loc[0:1,'y']
print(view._is_view)  # returns True
view[:] = True
print(df)

which results in:

    x      y
 0  10   True
 1  20   True
 2  30  False
 3  40  False
 4  50  False

Whether this is applicable to your use case depends on whether the selection in x is continuous, as it is in your simplified example.

phnx
  • 429
  • 2
  • 3