How to create new columns in Pandas dataframe with flags if the element in a list existing in another column?

Question

I wonder how to create new columns in Pandas dataframe with flags if the element in a list existing in another column? updated: The list will be updated frequently and can be very dynamic and long. Is there any way to create flags based on a dynamic list? Thank you.

Thank you so much.

list =['apple', 'banana', 'peach']

Input dataframe:

Output dataframe:

The list will be updated frequently and can be very dynamic and long. Is there any way to create dynamic flags based on a dynamic list? Thank you. — lionking19063, Feb 01 '22 at 18:33

score 4 · Answer 1 · answered Feb 01 '22 at 16:54

Try to explode fruit column into rows of fruit name then pivot your dataframe:

out = df.join(df['fruit'].str.split().explode().reset_index().assign(count=1)
                         .pivot_table('count', 'index', 'fruit', fill_value=0)
                         .add_prefix('flag_'))

Output:

>>> out
                fruit  flag_apple  flag_banana  flag_peach
0        apple banana           1            1           0
1         apple peach           1            0           1
2               peach           0            0           1
3              banana           0            1           0
4               apple           1            0           0
5  apple banana peach           1            1           1

Clever! Learned something new today. – Krishnakanth Allika Feb 01 '22 at 17:13 — Krishnakanth Allika, Feb 01 '22 at 17:13
Very smart! And it can be use in a dynamic list! – lionking19063 Feb 01 '22 at 18:42 — lionking19063, Feb 01 '22 at 18:42

Ben Grossmann · Accepted Answer · 2022-02-01T17:22:30.767

Here's a quick implementation of what I think you're trying to do.

import pandas as pd

fruits = ['apple','banana','peach'] # list of fruit
df = pd.DataFrame(                  # build dataframe
    {'fruit':[
        'apple banana',
        'apple peach',
        'peach',
        'banana',
        'apple',
        'apple banana peach']})

for f in fruits:
    df[f'flag_{f}'] = df['fruit'].str.count(f)
print(df)

Resulting output:

                fruit  flag_apple  flag_banana  flag_peach
0        apple banana           1            1           0
1         apple peach           1            0           1
2               peach           0            0           1
3              banana           0            1           0
4               apple           1            0           0
5  apple banana peach           1            1           1

Don't forget to modify the column names. – Corralien Feb 01 '22 at 17:01 — Corralien, Feb 01 '22 at 17:01

score 1 · Answer 3 · answered Feb 01 '22 at 17:14

1

Here is my attempt:

import pandas as pd


fruits = ['apple','banana','peach']
d = {"fruit" : ["apple banana", "apple peach", "peach","banana", "apple","apple banana peach"]}

df = pd.DataFrame(d)
x=[]
for elem in d['fruit']:
    x.append(elem.split(" "))

for f in fruits:
    df[f'flag_{f}'] = list(map(lambda e: int(f in e), x))
print(df)

I break the strings up into lists first and then check for membership using a lambda to create the new flag columns.

Output:

                fruit  flag_apple  flag_banana  flag_peach
0        apple banana           1            1           0
1         apple peach           1            0           1
2               peach           0            0           1
3              banana           0            1           0
4               apple           1            0           0
5  apple banana peach           1            1           1

answered Feb 01 '22 at 17:14

Richard K Yu

2,152
3
8
21

1

Two points: first, you should generally avoid looping through the rows of a data frame; see [this post](https://stackoverflow.com/a/55557758/2476977) or [this article](https://towardsdatascience.com/you-dont-always-have-to-loop-through-rows-in-pandas-22a970b347ac) for details on that. Second, there is no need to split the elements of the fruit column; `a in b` checks whether string `a` is a substring of string `b`. – Ben Grossmann Feb 01 '22 at 18:27
@BenGrossmann Thanks for taking the time to look through my solution and reply. I will read through these articles - I always wondered why I don't see iterative solutions for questions involving pandas! Turns out there was a reason all along – Richard K Yu Feb 01 '22 at 18:33
Awesome! works quite well. Thank you. – lionking19063 Feb 01 '22 at 18:45
In the first time, it runs great. Now it has an error "TypeError: 'list' object is not callable". Any insight? Thanks. – lionking19063 Feb 01 '22 at 19:52
@lionking19063 Are you running the same code exactly or is it using a different input that gives the TypeError? – Richard K Yu Feb 01 '22 at 20:08
@Richard K Yu After restarting the session, it works fine. Thank you. – lionking19063 Feb 01 '22 at 22:49

piterbarg · Answer 4 · 2022-02-01T17:38:17.827

1

Use explode and unstack

(df.assign(f = df['fruit'].str.split())
   .explode('f')
   .assign(v=1)
   .set_index(['fruit','f'])
   .unstack(fill_value=0)
   .droplevel(level=0,axis=1)
   .rename(columns = lambda c : f'flag_{c}')
   .reset_index()
)

output

    fruit                 flag_apple    flag_banana    flag_peach
--  ------------------  ------------  -------------  ------------
 0  apple                          1              0             0
 1  apple banana                   1              1             0
 2  apple banana peach             1              1             1
 3  apple peach                    1              0             1
 4  banana                         0              1             0
 5  peach                          0              0             1

edited Feb 01 '22 at 17:38

answered Feb 01 '22 at 17:14

piterbarg

8,089
2
6
22

I suggest you: 1. Replace `.unstack().fillna` by `unstack(fill_value=0)`, 2. Replace `.rename(...)` by `.add_prefix('flag_')`. 3. Remove `.astype(int)`. – Corralien Feb 01 '22 at 17:35
excellent tips, will do. actually will keep `rename` as is to show that there are different options – piterbarg Feb 01 '22 at 17:37

How to create new columns in Pandas dataframe with flags if the element in a list existing in another column?

4 Answers4