25

I have a large dataframe (10m rows, 40 columns, 7GB in memory). I would like to create a view in order to have a shorthand name for a view that is complicated to express, without adding another 2-4 GB to memory usage. In other words, I would rather type:

df2

Than:

df.loc[complicated_condition, some_columns]

The documentation states that, while using .loc ensures that setting values modifies the original dataframe, there is still no guarantee as to whether the object returned by .loc is a view or a copy.

I know I could assign the condition and column list to variables (e.g. df.loc[cond, cols]), but I'm generally curious to know whether it is possible to create a view of a dataframe.


Edit: Related questions:

JJJ
  • 1,009
  • 6
  • 19
  • 31
IanS
  • 15,771
  • 9
  • 60
  • 84
  • Did you ever find an answer to this? I want to do the same... – Thomas Mar 29 '18 at 09:22
  • No I haven't! My current understanding is that you cannot control what is a view and what is a copy. You have to trust that memory management (in numpy) is efficient... – IanS Mar 29 '18 at 09:53
  • 3
    With respect to your first observation on whether the object returned by .loc is a view or copy, [this](https://stackoverflow.com/a/23296545/5305519) might answer your question. – user5305519 Jul 26 '18 at 08:41
  • @JattYeo very interesting thanks, I've added a link to the question in my question – IanS Jul 26 '18 at 10:07

1 Answers1

12

You generally can't return a view.

Your answer lies in the pandas docs: returning-a-view-versus-a-copy.

Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.

This answer was found in the following post: Link.

Eran Yogev
  • 891
  • 10
  • 20
  • 5
    So, answer to the topicstarter's question is "no, it's not generally possible to create a view of a pandas dataframe if condition is complex enough (ie contains an array of labels)"? – Anatoly Alekseev Sep 15 '18 at 13:15
  • The top link now redirects to a page without this text (or seemingly any mention of how specifically to return a view vs a copy, only SettingWithCopy warning explanations) – johnDanger Feb 28 '20 at 23:55
  • 1
    Edited the answer with @Anatoly Alekseev input and updated link. – Eran Yogev Mar 01 '20 at 19:44
  • 1
    Latest Pandas [docs](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy) say that it no longer make any guarantee whether it return a view or a copy – Khoi Jul 11 '20 at 08:44
  • 4
    A shame. `df_view=df_view.apply(...)` is so much clearer than `df.loc[ind1, ind2]=df.loc[ind1, ind2].apply(...)` and much more likely to fit on a single line. – Jake Stevens-Haas Sep 23 '20 at 21:37