1

I want to reorder dataframe columns from a subclassed pandas dataframe.

I understood from this question there might be a better way for not subclassing a dataframe, but I'm still wondering how to approach this.

Without subclassing, I would do it in a classic way:

import pandas as pd

data = {'Description':['mydesc'], 'Name':['myname'], 'Symbol':['mysymbol']}
df = pd.DataFrame(data)

df = df[['Symbol', 'Name', 'Description']]

But with subclassing, keeping the same behavior as the classic one doesn't reorder the columns:

import pandas as pd

class SubDataFrame(pd.DataFrame):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self = self._reorder_columns()
    
    def _reorder_columns(self):
        first_columns = ['Symbol', 'Name', 'Description']
        return self[first_columns + [c for c in self.columns if c not in first_columns]]
    
data = {'Description':['mydesc'], 'Name':['myname'], 'Symbol':['mysymbol']}
df = SubDataFrame(data)

I believe my mistake is in reassigning self which doesn't have any effect.

How can I achieve column reordering on the subclassed dataframe?

Begoodpy
  • 1,018
  • 3
  • 8
  • 20

1 Answers1

1

Pandas methods that have an inplace parameter use the private method _update_inplace. You could do the same, but be sure to follow future pandas development in case this method changes:

import pandas as pd

class SubDataFrame(pd.DataFrame):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._update_inplace(self._reorder_columns())
    
    def _reorder_columns(self):
        first_columns = ['Symbol', 'Name', 'Description']
        return self[first_columns + [c for c in self.columns if c not in first_columns]]
    
data = {'Description':['mydesc'], 'Name':['myname'], 'Symbol':['mysymbol']}
df = SubDataFrame(data)

Output:

     Symbol    Name Description
0  mysymbol  myname      mydesc
mozway
  • 194,879
  • 13
  • 39
  • 75