17

From the reindex docs:

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Therefore, I thought that I would get a reordered Dataframe by setting copy=False in place (!). It appears, however, that I do get a copy and need to assign it to the original object again. I don't want to assign it back, if I can avoid it (the reason comes from this other question).

This is what I am doing:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(5, 5))

df.columns = [ 'a', 'b', 'c', 'd', 'e' ]

df.head()

Outs:

          a         b         c         d         e
0  0.234296  0.011235  0.664617  0.983243  0.177639
1  0.378308  0.659315  0.949093  0.872945  0.383024
2  0.976728  0.419274  0.993282  0.668539  0.970228
3  0.322936  0.555642  0.862659  0.134570  0.675897
4  0.167638  0.578831  0.141339  0.232592  0.976057

Reindex gives me the correct output, but I'd need to assign it back to the original object, which is what I wanted to avoid by using copy=False:

df.reindex( columns=['e', 'd', 'c', 'b', 'a'], copy=False )

The desired output after that line is:

          e         d         c         b         a
0  0.177639  0.983243  0.664617  0.011235  0.234296
1  0.383024  0.872945  0.949093  0.659315  0.378308
2  0.970228  0.668539  0.993282  0.419274  0.976728
3  0.675897  0.134570  0.862659  0.555642  0.322936
4  0.976057  0.232592  0.141339  0.578831  0.167638

Why is copy=False not working in place?

Is it possible to do that at all?


Working with python 3.5.3, pandas 0.23.3

cs95
  • 379,657
  • 97
  • 704
  • 746
Luis
  • 3,327
  • 6
  • 35
  • 62
  • 3
    https://github.com/pandas-dev/pandas/issues/21598, you need to assign it back `df=df.reindex( columns=['e', 'd', 'c', 'b', 'a'])` – BENY Jun 05 '19 at 14:05

2 Answers2

20

reindex is a structural change, not a cosmetic or transformative one. As such, a copy is always returned because the operation cannot be done in-place (it would require allocating new memory for underlying arrays, etc). This means you have to assign the result back, there's no other choice.

df = df.reindex(['e', 'd', 'c', 'b', 'a'], axis=1)  

Also see the discussion on GH21598.


The one corner case where copy=False is actually of any use is when the indices used to reindex df are identical to the ones it already has. You can check by comparing the ids:

id(df)
# 4839372504

id(df.reindex(df.index, copy=False)) # same object returned 
# 4839372504

id(df.reindex(df.index, copy=True))  # new object created - ids are different
# 4839371608  
cs95
  • 379,657
  • 97
  • 704
  • 746
  • _Struggling to understand this... think..._ Does it mean that `copy=` was implemented to use with `True`, but not with `False`? The corner case you mean is... to... reindex and assign to a new dataframe, keeping the pre-indexed too? – Luis Jun 05 '19 at 14:23
  • @Luis if a call to reindex doesn't actually result in a DataFrame being reindexed, then would you want pandas to waste time generating a copy of the data you already have? (I don't know, but the argument exists, so I assume it is useful for somebody somewhere.) – cs95 Jun 05 '19 at 14:29
  • 2
    Interesting indeed. And misleading, if I may say. Anyhow, thanks, and, if you happen to learn something new, I'd be happy to hear about it :) – Luis Jun 05 '19 at 14:39
  • Btw, I'd like to leave the question open for a little while, maybe somebody else has something to add... – Luis Jun 05 '19 at 14:41
  • @Luis I'd imagine you'd turn `copy=False` if you're not aware before hand what you're reindexing with and want the most performance out of your code (obviously, not generating a copy of a lot of data if you don't need to is going to be faster - about 5x faster by my tests). – cs95 Jun 05 '19 at 14:41
  • That use case would make sense, sure: `if cond: new_index = something; else: new_index = current_index` → `copy=False`, for those `else` cases; I'd pack the `reindex` inside the `if`, though :P – Luis Jun 05 '19 at 14:47
  • @Luis Sure, it doesn't matter either way if it's a standalone operation, but it's actually useful when you're trying to chain a lot of methods, something that is frequently done in pandas. Here's an example of a one line solution to a complex reshaping operation thanks to the power of method chaining: https://stackoverflow.com/a/50731254/4909087 – cs95 Jun 05 '19 at 14:50
-1

A bit off topic, but I believe this would rearrange the columns in place

    for i, colname in enumerate(list_of_columns_in_desired_order):
        col = dataset.pop(colname)
        dataset.insert(i, colname, col)
Matek
  • 641
  • 5
  • 16