pandas translate from a column that is a list to create new columns with all options as a binary yes/no if the value exists in the original list

Question

given the data set

#Create Series
s = pd.Series([[1,2,3,],[1,10,11],[2,11,12]],['buz','bas','bur'])
k = pd.Series(['y','n','o'],['buz','bas','bur'])

#Create DataFrame df from two series
df = pd.DataFrame({'first':s,'second':k})

I was able to create new columns based on all possible values of 'first'

def text_to_list(df,col):
    val=df[col].explode().unique()
    return val

unique=text_to_list(df,'first')

for options in unique :
    df[options]=0

now I need to check off (or turn the value to '1') in each row and column where that value exists in the original list of 'first'

I'm pretty sure its a combination of .isin and/or .apply, but i'm struggling

the end result should be for row
buz: cols 1,2,3 are 1
bas: cols 1,10,11 are 1
bur: cols 2,11,12 are 1

          first   second  1  2  3  10  11  12
buz     [1, 2, 3]      y  1  1  1   0   0   0
bas    [1, 10, 11]     n  1  0  0   1   1   0
bur    [2, 11, 12]     o  0  1  0   0   1   1

adding the solution provided by -https://stackoverflow.com/users/3558077/ashutosh-porwal


df1=df.join(pd.get_dummies(df['first'].apply(pd.Series).stack()).sum(level=0))
print(df1)

Note: this solution did not require my hack job of creating the columns beforehand by explode column 'first'

Does this answer your question? https://stackoverflow.com/a/29036042/3558077 Try: ```df.join(pd.get_dummies(df['first'].apply(pd.Series).stack()).sum(level=0))``` — porrrwal, Nov 03 '21 at 08:16

score 1 · Accepted Answer · answered Nov 03 '21 at 08:27

1

From your update it seems that what you need is simply:

for opt in unique :
    df[opt]=df['first'].apply(lambda x: int(opt in x))

Output:

           first second  1  2  3  10  11  12
buz    [1, 2, 3]      y  1  1  1   0   0   0
bas  [1, 10, 11]      n  1  0  0   1   1   0
bur  [2, 11, 12]      o  0  1  0   0   1   1

answered Nov 03 '21 at 08:27

Tranbi

11,407
6
16
33

was able to take it a step further for text: df[opt]=df['first'].apply(lambda x: 'yes' if opt in x else 'no') – kdot Nov 03 '21 at 09:52

Amir saleem · Answer 2 · 2021-11-03T08:37:41.573

Data:

>>> import pandas as pd
>>> s = pd.Series([[1,2,3,],[1,10,11],[2,11,12]],['buz','bas','bur'])
>>> k = pd.Series(['y','n','o'],['buz','bas','bur'])
>>> df = pd.DataFrame({'first':s,'second':k})
>>> df
           first second
buz    [1, 2, 3]      y
bas  [1, 10, 11]      n
bur  [2, 11, 12]      o

Solution:

>>> df[df['first'].explode().to_list()] = 0
>>> df = df[['first', 'second']].join(df.apply(lambda x:x.loc[x['first']], axis=1).replace({0 : 1, np.nan : 0}).astype(int))
>>> df 

           first second  1  2  3  10  11  12
buz    [1, 2, 3]      y  1  1  1   0   0   0
bas  [1, 10, 11]      n  1  0  0   1   1   0
bur  [2, 11, 12]      o  0  1  0   0   1   1

Corralien · Answer 3 · 2021-11-03T08:39:21.483

Use pd.merge and pivot_table:

out = df.reset_index().explode('first') \
        .pivot_table(values='index', index='second', columns='first',
                     aggfunc='any', fill_value=False, sort=False).astype(int)

out = df.merge(out, on='second')

Output:

>>> out
         first second  1  2  3  10  11  12
0    [1, 2, 3]      y  1  1  1   0   0   0
1  [1, 10, 11]      n  1  0  0   1   1   0
2  [2, 11, 12]      o  0  1  0   0   1   1

pandas translate from a column that is a list to create new columns with all options as a binary yes/no if the value exists in the original list

3 Answers3