How to map to values inside a column of lists in pandas

Question

I have the following scenario.

import pandas as pd

d = {'col1': [1, 2, 3], 'col2': [['apple'], [], ['romaine', 'potatoes']}
df = pd.DataFrame(data=d)

So the dataframe is:

   col1   col2
0   1     [apple]
1   2     []
2   3     [romaine, potatoes]

I also have a dictionary:

my_dict = {"apple" : "fruit", "potatoes" : "vegetable", "romaine" : "lettuce"}

I want to create another column "col3" which will have a list of values from my_dict above:

   col1   col2                 col3
0   1     [apple]              [fruit]
1   2     []                   []
2   3     [romaine, potatoes]  [lettuce, vegetable]

I want to write a single line of code using apply, map, lambda to achieve this:

df["col3"] = df.col2.apply(map(lambda x: pass if not x else condition_dict[x]))

I am really stuck and wonder if it is possible without writing a separate function and then passing as an argument to apply.

Trenton McKinney · Accepted Answer · 2021-05-11T04:52:14.810

For a sample dataframe with 1M rows, .apply with a list-comprehension is about 2.5 times faster than .explode() with .groupby(), and a little faster (1.15x) than using .map().
- See List comprehension vs map
If there is a NaN in the column, the row must be dropped with .dropna, or it can be filled with an empty list.
- .fillna([]) will not work
- Use df.col2 = df.col2.fillna({i: [] for i in df.index})

df['col3'] = df.col2.apply(lambda x: [my_dict.get(v) for v in x])

# display(df)
 col1                col2                 col3
    1             [apple]              [fruit]
    2                  []                   []
    3 [romaine, potatoes] [lettuce, vegetable]

`%timeit` test

# test data with 1M rows
d = {'col1': [1, 2, 3], 'col2': [['apple'], [], ['romaine', 'potatoes']]}
df = pd.DataFrame(d)
df = pd.concat([df]*333333)

%%timeit
df.col2.apply(lambda x: [my_dict.get(v) for v in x])
[out]:
453 ms ± 30.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

def scott(d, my_dict):
    e = d.explode('col2')
    e['col3'] = e['col2'].map(my_dict)
    return e.groupby('col1', as_index=False)[['col3']].agg(list).merge(d)

%%timeit
scott(df, my_dict)
[out]:
1.17 s ± 23.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
df.col2.map(lambda x: list(map(my_dict.get, x)))
[out]:
519 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
df['col2'].explode().map(my_dict).groupby(level=0).agg(list)
[out]:
909 ms ± 8.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

score 3 · Answer 2 · answered Jan 29 '21 at 21:45

Try this:

dfe = df.explode('col2')
dfe['col3'] = dfe['col2'].map(my_dict)
dfe.groupby('col1', as_index=False)[['col3']].agg(list).merge(df)

Output:

   col1                  col3                 col2
0     1               [fruit]              [apple]
1     2                 [nan]                   []
2     3  [lettuce, vegetable]  [romaine, potatoes]

Or as a one-liner:

(df.merge(df['col2'].explode()
                    .map(my_dict)
                    .groupby(df['col1'])
                    .agg(list)
                    .rename('col3'), 
           left_on='col1', 
           right_index=True)

score 0 · Answer 3 · answered Jan 29 '21 at 22:03

0

df.col2.map(lambda x: list(map(my_dict.get, x)))

answered Jan 29 '21 at 22:03

PieCot

3,564
1
12
20

How to map to values inside a column of lists in pandas

3 Answers3

%timeit test

`%timeit` test