7

Please help me to understand: what is a view in Pandas. I know that if we change something in a view we always make changes in the original object.

But a view of an object and the original object have different id's for example. Does it mean that the view is another object with reference to original object? What is the mechanism?

I tried but can't find an explanation.

import pandas as pd
import numpy as np

df = pd.DataFrame({'x': [1,2]})
print(df)
df_sub = df[0:1]
df_sub.x = -1
print(df_sub._is_view)               # True
print(id(df) == id(df_sub))          # False
print(np.shares_memory(df, df_sub))  # True
MSeifert
  • 145,886
  • 38
  • 333
  • 352
Gusev Slava
  • 2,136
  • 3
  • 21
  • 26
  • 1
    Can you show an example? – OneCricketeer Jan 20 '17 at 13:52
  • @cricket_007 sure, sec – Gusev Slava Jan 20 '17 at 13:59
  • Check out [this article](http://www.scipy-lectures.org/intro/numpy/array_object.html#copies-and-views) and [this post](http://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html) on numpy views vs. copies. A lot of pandas is either analogous to numpy or really is numpy under the surface so this explanation should apply to your case for the most part. – bunji Jan 20 '17 at 14:01
  • If I understand correctly, a view is a subset of the original data... http://stackoverflow.com/questions/17960511/pandas-subindexing-dataframes-copies-vs-views – OneCricketeer Jan 20 '17 at 14:02
  • @bunji Thanks, I've read this 2 articles. But I can't find the explanation what is `View` :) – Gusev Slava Jan 20 '17 at 14:48
  • I would like to share a link: [Copies and Views](https://docs.scipy.org/doc/numpy/user/quickstart.html#copies-and-views) – Grijesh Chauhan Mar 10 '19 at 16:42

1 Answers1

8

To understand what a View is, you have to know what an arrays is. An array is not only the "stuff" (items) you put in it. It needs (besides others) also information about the number of elements, the shape of your array and how to interpret the elements.

So an array would be an object at least containing these attributes:

class Series:
    data    # A pointer to where your array is stored
    size    # The number of items in your array
    shape   # The shape of your array
    dtype   # How to interpret the array

So when you create a view a new array object is created but (and that's important) the View's data pointer points to the original array. It could be offset but it still points to one memory location that belongs to the original array. But even though it shares some data with the original the size, shape, dtype (, ...) might have changed so it requires a new object. That's why they have different ids.

Think of it like windows. You have a garden (the array) and you have several windows, each window is a different object but all of them look out at the same (your) garden. Ok, granted, with some slicing operations you would have more escher-like windows but a metaphor always lacks some details :-)

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 2
    Might be good to add [the `stride` attribute](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray) to the list too. Slices of NumPy arrays are a common kind of view which point to the same underlying data but with a different stride (and perhaps offset). – unutbu Jan 20 '17 at 15:18
  • @unutbu I agree that they are important, especially when slicing with steps but I thought they are probably more confusing when understanding "what a view is". – MSeifert Jan 20 '17 at 15:50