0

I was wondering if map method was the best option when a simple mapping was necessary in a column, since using map or apply is usually a bad idea .

Lucas Hattori
  • 87
  • 1
  • 8

1 Answers1

0

I compared the following functions for the simple case below. Please share if you have better alternatives.

# Case - Map the random number to its string
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,7,size=(5000,1)), columns=['A'])
dikt = {1:'1',2:'2',3:'3',4:'4',5:'5',6:'6'}

First function - using map method:

def f1():
    df1 = df.copy()
    df1['B'] = df['A'].map(dikt)
    return df1

Results: enter image description here

Second function - using to_list method in column:

def f2():
    df2 = df.copy()
    column_list = df2['A'].tolist()
    df2['B'] = [dikt[i] for i in column_list]
    return df2

Results: enter image description here

Lucas Hattori
  • 87
  • 1
  • 8
  • 2
    hmmm, 5k is small dataframe, try test for 50k, 500k, 5M rows too. – jezrael Aug 05 '20 at 08:44
  • 2
    You could have done `%timeit f2()`...`%timeit for x in range(100): f2()` has an overhead of for loop and range and `df.copy()`'s and performance should be tested on large df's atleast 100K size. – Ch3steR Aug 05 '20 at 08:51
  • 1
    Here's [`timeit results I did on repl.it`](https://repl.it/repls/OccasionalThunderousQueries#main.py) `pd.Series.map` was 5X faster than *list comp* for dataframe of size 5 million. – Ch3steR Aug 05 '20 at 09:15