1

General problem : I have two similar data frames (same shape, same variables but different values).
How to applymap a function on each cell of df1, according the value of the same cell from df2.

My specific problem : How to applymap(round()) function on each cell of df1, according the decimal number of that cell from df2.

I did this with a for loop across the columns of my dataframe. I now want to optimize the code using df.applymap() or df.apply(np.vectorized()) function to avoid the loop.

optional : I also want to want to shuffle this round decimal number by variable.

The code bellow works properly but need to be optimized.

import numpy as np
import pandas as pd 
   
# Count decimal number
def decimal_count(number):
   f = str(number)
   if '.' in f:
       digits = f[::-1].find('.')
   else : digits = 0
   return digits 
   
# dataframe I want to round
df_to_round = pd.DataFrame({'Integers' :[1, 2.3, 4.1, 4, 5], 
                  'Float' :[1.1, 2.2, 3.5444, 4.433 ,5.5555]})

# dataframe with correct decimal number
df_rounded = pd.DataFrame({'Integers' :[1, 2, 3, 4, 5], 
                  'Float' :[1.1, 6.233, 3.34, 4.46 ,5.777]})


# round to the right decimal
for column in inverse_quanti.columns:

   # get decimal 
   df_to_round['decimals'] = df_rounded[column].apply(decimal_count)

   # shuffle decimal level 
   # only if needed
   # df_to_round['decimals'] = np.random.permutation(df_to_round['decimals'].values)

   # Apply round function to df_to_round
   df_to_round[column] = df_to_round[[column, 'decimals']].apply(lambda x : round(x[column], int(x['decimals'])), axis= 1)

   df_to_round.drop(['decimals'], axis = 1, inplace = True)

My main obstacle is how to adapt the # Apply round function to df_to_round step to vectorized method.

jpetot
  • 145
  • 1
  • 10

1 Answers1

0

I usually use swifter for this as it is the easiest option for vectorizing the apply() function in pandas.

Install it:

$ pip install -U pandas # upgrade pandas
$ pip install swifter # first time installation
$ pip install swifter[modin-ray] # first time installation including modin[ray]
$ pip install swifter[modin-dask] # first time installation including modin[dask]

$ pip install -U swifter # upgrade to latest version if already installed

Then just use it like so in the code.

Note: it does not work when you use a groupby() before the apply().

import swifter

# round to the right decimal
for column in inverse_quanti.columns:

   # get decimal 
   df_to_round['decimals'] = df_rounded[column].swifter.apply(decimal_count)

   # shuffle decimal level 
   # only if needed
   # df_to_round['decimals'] = np.random.permutation(df_to_round['decimals'].values)

   # Apply round function to df_to_round
   df_to_round[column] = df_to_round[[column, 'decimals']].swifter.apply(lambda x : round(x[column], int(x['decimals'])), axis= 1)

   df_to_round.drop(['decimals'], axis = 1, inplace = True)
yudhiesh
  • 6,383
  • 3
  • 16
  • 49
  • by using swifter, you assume that I can't vectorize my problem, are you sure about that? – jpetot Jun 28 '21 at 12:57
  • 1
    @jpetot bold assumption there, you could just create a function to do the processing and pass it to the `apply()` but then again you can't really do much about it as the `apply()` scales linearly with the amount of data you have. – yudhiesh Jun 28 '21 at 13:01
  • @jpetot you can have a look at the performance of it [here](https://github.com/jmcarpenter2/swifter#vectorizes-your-function-when-possible) – yudhiesh Jun 28 '21 at 13:08