0

In pandas it is always important to know if something is a view on a DataFrame. I just wanted to know if there is some special attribute or way to distinguish a view from a genuine DataFrame. I tried the following without success (see below)

#Create toy data and a view
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
view = df[:]

#Check if there are any attributes unique to proper dfs vs views and vice versa
view_vars = set((x for x in dir(view) if not x.startswith("_")))
df_vars = set((x for x in dir(df) if not x.startswith("_")))

view_vars ^ df_vars # no differences

#Repeat check for all attributes (i.e. including attributes with underscore in front)
view_vars = set((x for x in dir(view) ))
df_vars = set((x for x in dir(df) ))

view_vars ^ df_vars # still no differences

#Check numpy base values -> same
df.values.base
view.values.base

Can anyone provide me with some advice on how to programmatically disentangle a view from a genuine (copy of a) DataFrame?

P.Jo
  • 532
  • 3
  • 9

1 Answers1

0

In the code you provided, both df and view are dataframes. That's why they have exactly the same methods and attributes.

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
view = df[:]
print(type(df))  # <class 'pandas.core.frame.DataFrame'>
print(type(view))  # <class 'pandas.core.frame.DataFrame'>

The DataFrame does not have the method .view(). However the Series has it:

view = df["foo"].view()

print(type(df))  # <class 'pandas.core.frame.DataFrame'>
print(type(view))  # <class 'pandas.core.series.Series'>

view_vars = set((x for x in dir(view) ))
df_vars = set((x for x in dir(df) ))

print(view_vars ^ df_vars)  # differences

Source: https://www.w3resource.com/pandas/series/series-view.php

matleg
  • 618
  • 4
  • 11
  • Well it can't be a copy - because modifying view also modifies df. You can check it via view.iloc[1,1] = 500. Btw. I got the example from the pandas doc: https://pandas.pydata.org/docs/user_guide/copy_on_write.html. they also call it view – P.Jo Jul 05 '23 at 09:30
  • 1
    OK my bad, the name is misleading as view is also a function. Maybe what you are looking for is rather to know if the two objects "point to" the same memory address. For this you can use numpy: print(np.may_share_memory(df, view)) This is the associated post: https://stackoverflow.com/a/64842082/13339621 . – matleg Jul 05 '23 at 09:58