4

Maybe this is more of a theoretical language question rather than pandas per-se. I have a set of function extensions that I'd like to "attach" to e.g. a pandas DataFrame without explicitly calling utility functions and passing the DataFrame as an argument i.e. to have the syntactic sugar. Extending Pandas DataFrame is also not a choice because of the inaccessible types needed to define and chain the DataFrame contructor e.g. Axes and Dtype.

In Scala one can define an implicit class to attach functionality to an otherwise unavailable or too-complex-to-initialize object e.g. the String type can't be extended in Java AFAIR. For example the following attaches a function to a String type dynamically https://www.oreilly.com/library/view/scala-cookbook/9781449340292/ch01s11.html

scala> implicit class StringImprovements(s: String) {
    def increment = s.map(c => (c + 1).toChar)
}

scala> val result = "HAL".increment   
result: String = IBM

Likewise, I'd like to be able to do:

# somewhere in scope
def lexi_sort(df):
    """Lexicographically sorts the input pandas DataFrame by index and columns""" 
    df.sort_index(axis=0, level=df.index.names, inplace=True)
    df.sort_index(axis=1, level=df.columns.names, inplace=True)
    return df

df = pd.DataFrame(...)
# some magic and then ...
df.lexi_sort()

One valid possibility is to use the Decorator Pattern but I was wondering whether Python offered a less boiler-plate language alternative like Scala does.

SkyWalker
  • 13,729
  • 18
  • 91
  • 187
  • 1
    Related - [What is monkey patching?](https://stackoverflow.com/questions/5626193/what-is-monkey-patching) – user Oct 02 '20 at 16:03
  • Why not utilize the factory pattern to add this functionality to the dataframe at the time of creation and not at some undefined point later in execution? This would avoid passing the dataframe as an argument as well as allow you to standardize what gets added. – Stephen Oct 02 '20 at 16:04
  • pandas has guide https://pandas.pydata.org/pandas-docs/stable/development/extending.html and possible duplicate of https://stackoverflow.com/questions/22155951/how-to-subclass-pandas-dataframe – Equinox Oct 02 '20 at 16:05
  • @Stephen not possible, you can't replicate the DataFrame constructor as its dependency types are inaccessible. – SkyWalker Oct 02 '20 at 16:06
  • @SkyWalker I don't believe you would need to replicate the constructor of the dataframe. Just define a facory class, instantiate it, and then pass the same args to the factory construction function. It's construction method then uses those args to create a dataframe, add your additional function, and return the new object. – Stephen Oct 02 '20 at 16:09
  • Can you include an example DataFrame with your [mre]? – wwii Oct 02 '20 at 16:10

2 Answers2

5

In pandas, you can do:

def lexi_sort(df):
    """Lexicographically sorts the input pandas DataFrame by index and columns"""
    df.sort_index(axis=0, level=df.index.names, inplace=True)
    df.sort_index(axis=1, level=df.columns.names, inplace=True)
    return df

pd.DataFrame.lexi_sort = lexi_sort

df = pd.read_csv('dummy.csv')
df.lexi_sort()

I guess for other objects you can define a method within the class to achieve the same outcome.

class A():
    def __init__(self, df:pd.DataFrame):
        self.df = df
        self.n = 0

    def lexi_sort(self):
        """Lexicographically sorts the input pandas DataFrame by index and columns"""
        self.df.sort_index(axis=0, level=self.df.index.names, inplace=True)
        self.df.sort_index(axis=1, level=self.df.columns.names, inplace=True)
        return df

    def add_one(self):
        self.n += 1

a = A(df)
print(a.n)
a.add_one()
print(a.n)
user2827262
  • 157
  • 8
  • 1
    thank you! really clean and simple, testing it ... your `A` definition is essentially the Decorator Pattern. – SkyWalker Oct 02 '20 at 16:04
3

Subclass DataFrame and don't do anything but add your feature.

import pd
import random,string

class Foo(pd.DataFrame):
    def lexi_sort(self):
        """Lexicographically sorts the input pandas DataFrame by index and columns""" 
        self.sort_index(axis=0, level=df.index.names, inplace=True)
        self.sort_index(axis=1, level=df.columns.names, inplace=True)

nrows = 10        
columns = ['b','d','a','c']
rows = [random.sample(string.ascii_lowercase,len(columns)) for _ in range(nrows)]
index = random.sample(string.ascii_lowercase,nrows)

df = Foo(rows,index,columns)

>>> df
   b  d  a  c
w  n  g  u  m
x  t  e  q  k
n  u  x  j  s
u  s  t  u  b
f  g  t  e  j
j  w  b  h  j
h  v  o  p  a
a  q  i  l  b
g  p  i  k  u
o  q  x  p  t
>>> df.lexi_sort()
>>> df
   a  b  c  d
a  l  q  b  i
f  e  g  j  t
g  k  p  u  i
h  p  v  a  o
j  h  w  j  b
n  j  u  s  x
o  p  q  t  x
u  u  s  b  t
w  u  n  m  g
x  q  t  k  e
>>
wwii
  • 23,232
  • 7
  • 37
  • 77