How Does Python Apply a Method from one Library to the Object of Another?

Question

When using pandarallel to use all cores when running .apply methods on my dataframes, I came across a syntax which I never saw before. Rather, it's a way of using dot syntax that I don't understand.

import pandas as pd
from pandarallel import pandarallel

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b'])

So far so good, just setting up a dataframe. Next, to get pandarallel ready, we do

pandarallel.initialize()

Next up is the bit where I am confused: to use pandarallel we call this method on the dataframe

df.parallel_apply(func)

My question is: if the dataframe df was instantiated using the pandas library, and pandas does not have a method called parallel_apply, how is it that Python knows to use the pandarallel method on the pandas object?

I presume it's something to do with the initialization, but I have never seen this before and I don't understand what's happening in the back end.

score 3 · Answer 1 · answered Aug 25 '20 at 13:50

3

It appears to happen in initialize:

DataFrame.parallel_apply = parallelize(*args)

It seems that Dataframes allow attributes to be added on later, and that's what's happening here. parallelize appears to be a factory function that creates functions based on the passed args. It seems to be creating functions to act as methods, and that method it creates is being assigned to parallel_apply.

answered Aug 25 '20 at 13:50

Carcigenicate

43,494
9
68
117

Thanks for the answer. I gave top answer to Bruno because he explained how the dataframe is passed a 'self' parameter to any arbitrary function. I do appreciate your explanation of the initialize function in pandarallel.py though! – Alan Aug 25 '20 at 14:16

score 3 · Accepted Answer · answered Aug 25 '20 at 13:56

You can create your methods to a previously created object:

def my_func(self):
    return 2*self


pd.DataFrame.my_method = my_func

df.my_method()

a   b
2   8
4  10
6  12

You can even pass arguments:

def sum_x(self, x):
    return self+x

pd.DataFrame.sum_x = sum_x

df.sum_x(3)
a  b
4  7
5  8
6  9

The first argument will be the self as a usual method inside a class.

How Does Python Apply a Method from one Library to the Object of Another?

2 Answers2