3

When using pandarallel to use all cores when running .apply methods on my dataframes, I came across a syntax which I never saw before. Rather, it's a way of using dot syntax that I don't understand.

import pandas as pd
from pandarallel import pandarallel

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b'])


So far so good, just setting up a dataframe. Next, to get pandarallel ready, we do

pandarallel.initialize()


Next up is the bit where I am confused: to use pandarallel we call this method on the dataframe

df.parallel_apply(func)


My question is: if the dataframe df was instantiated using the pandas library, and pandas does not have a method called parallel_apply, how is it that Python knows to use the pandarallel method on the pandas object?

I presume it's something to do with the initialization, but I have never seen this before and I don't understand what's happening in the back end.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Alan
  • 1,746
  • 7
  • 21

2 Answers2

3

It appears to happen in initialize:

DataFrame.parallel_apply = parallelize(*args)

It seems that Dataframes allow attributes to be added on later, and that's what's happening here. parallelize appears to be a factory function that creates functions based on the passed args. It seems to be creating functions to act as methods, and that method it creates is being assigned to parallel_apply.

Carcigenicate
  • 43,494
  • 9
  • 68
  • 117
  • Thanks for the answer. I gave top answer to Bruno because he explained how the dataframe is passed a 'self' parameter to any arbitrary function. I do appreciate your explanation of the initialize function in pandarallel.py though! – Alan Aug 25 '20 at 14:16
3

You can create your methods to a previously created object:

def my_func(self):
    return 2*self


pd.DataFrame.my_method = my_func

df.my_method()

a   b
2   8
4  10
6  12

You can even pass arguments:

def sum_x(self, x):
    return self+x

pd.DataFrame.sum_x = sum_x

df.sum_x(3)
a  b
4  7
5  8
6  9

The first argument will be the self as a usual method inside a class.

Bruno Mello
  • 4,448
  • 1
  • 9
  • 39