Most efficient way of mapping column in pandas DataFrame

Question

I was wondering if map method was the best option when a simple mapping was necessary in a column, since using map or apply is usually a bad idea .

score 0 · Answer 1 · answered Aug 05 '20 at 08:43

0

I compared the following functions for the simple case below. Please share if you have better alternatives.

# Case - Map the random number to its string
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,7,size=(5000,1)), columns=['A'])
dikt = {1:'1',2:'2',3:'3',4:'4',5:'5',6:'6'}

First function - using `map` method:

def f1():
    df1 = df.copy()
    df1['B'] = df['A'].map(dikt)
    return df1

Results:

Second function - using `to_list` method in column:

def f2():
    df2 = df.copy()
    column_list = df2['A'].tolist()
    df2['B'] = [dikt[i] for i in column_list]
    return df2

Results:

answered Aug 05 '20 at 08:43

Lucas Hattori

87
1
8

2

hmmm, 5k is small dataframe, try test for 50k, 500k, 5M rows too. – jezrael Aug 05 '20 at 08:44
2

You could have done `%timeit f2()`...`%timeit for x in range(100): f2()` has an overhead of for loop and range and `df.copy()`'s and performance should be tested on large df's atleast 100K size. – Ch3steR Aug 05 '20 at 08:51
1

Here's [`timeit results I did on repl.it`](https://repl.it/repls/OccasionalThunderousQueries#main.py) `pd.Series.map` was 5X faster than *list comp* for dataframe of size 5 million. – Ch3steR Aug 05 '20 at 09:15

Most efficient way of mapping column in pandas DataFrame

1 Answers1

First function - using map method:

Second function - using to_list method in column:

First function - using `map` method:

Second function - using `to_list` method in column: