I was wondering if map
method was the best option when a simple mapping was necessary in a column, since using map
or apply
is usually a bad idea .
Asked
Active
Viewed 480 times
0

Lucas Hattori
- 87
- 1
- 8
1 Answers
0
I compared the following functions for the simple case below. Please share if you have better alternatives.
# Case - Map the random number to its string
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,7,size=(5000,1)), columns=['A'])
dikt = {1:'1',2:'2',3:'3',4:'4',5:'5',6:'6'}
First function - using map
method:
def f1():
df1 = df.copy()
df1['B'] = df['A'].map(dikt)
return df1
Second function - using to_list
method in column:
def f2():
df2 = df.copy()
column_list = df2['A'].tolist()
df2['B'] = [dikt[i] for i in column_list]
return df2

Lucas Hattori
- 87
- 1
- 8
-
2hmmm, 5k is small dataframe, try test for 50k, 500k, 5M rows too. – jezrael Aug 05 '20 at 08:44
-
2You could have done `%timeit f2()`...`%timeit for x in range(100): f2()` has an overhead of for loop and range and `df.copy()`'s and performance should be tested on large df's atleast 100K size. – Ch3steR Aug 05 '20 at 08:51
-
1Here's [`timeit results I did on repl.it`](https://repl.it/repls/OccasionalThunderousQueries#main.py) `pd.Series.map` was 5X faster than *list comp* for dataframe of size 5 million. – Ch3steR Aug 05 '20 at 09:15