1

given the data set

#Create Series
s = pd.Series([[1,2,3,],[1,10,11],[2,11,12]],['buz','bas','bur'])
k = pd.Series(['y','n','o'],['buz','bas','bur'])

#Create DataFrame df from two series
df = pd.DataFrame({'first':s,'second':k})

I was able to create new columns based on all possible values of 'first'

def text_to_list(df,col):
    val=df[col].explode().unique()
    return val

unique=text_to_list(df,'first')

for options in unique :
    df[options]=0


now I need to check off (or turn the value to '1') in each row and column where that value exists in the original list of 'first'

I'm pretty sure its a combination of .isin and/or .apply, but i'm struggling

the end result should be for row
buz: cols 1,2,3 are 1
bas: cols 1,10,11 are 1
bur: cols 2,11,12 are 1

          first   second  1  2  3  10  11  12
buz     [1, 2, 3]      y  1  1  1   0   0   0
bas    [1, 10, 11]     n  1  0  0   1   1   0
bur    [2, 11, 12]     o  0  1  0   0   1   1

adding the solution provided by -https://stackoverflow.com/users/3558077/ashutosh-porwal


df1=df.join(pd.get_dummies(df['first'].apply(pd.Series).stack()).sum(level=0))
print(df1)

Note: this solution did not require my hack job of creating the columns beforehand by explode column 'first'

kdot
  • 49
  • 7
  • what output do you expect in your example? – Tranbi Nov 03 '21 at 08:11
  • 1
    Does this answer your question? https://stackoverflow.com/a/29036042/3558077 Try: ```df.join(pd.get_dummies(df['first'].apply(pd.Series).stack()).sum(level=0))``` – porrrwal Nov 03 '21 at 08:16

3 Answers3

1

From your update it seems that what you need is simply:

for opt in unique :
    df[opt]=df['first'].apply(lambda x: int(opt in x))

Output:

           first second  1  2  3  10  11  12
buz    [1, 2, 3]      y  1  1  1   0   0   0
bas  [1, 10, 11]      n  1  0  0   1   1   0
bur  [2, 11, 12]      o  0  1  0   0   1   1
Tranbi
  • 11,407
  • 6
  • 16
  • 33
  • was able to take it a step further for text: df[opt]=df['first'].apply(lambda x: 'yes' if opt in x else 'no') – kdot Nov 03 '21 at 09:52
0

Data:

>>> import pandas as pd
>>> s = pd.Series([[1,2,3,],[1,10,11],[2,11,12]],['buz','bas','bur'])
>>> k = pd.Series(['y','n','o'],['buz','bas','bur'])
>>> df = pd.DataFrame({'first':s,'second':k})
>>> df
           first second
buz    [1, 2, 3]      y
bas  [1, 10, 11]      n
bur  [2, 11, 12]      o

Solution:

>>> df[df['first'].explode().to_list()] = 0
>>> df = df[['first', 'second']].join(df.apply(lambda x:x.loc[x['first']], axis=1).replace({0 : 1, np.nan : 0}).astype(int))
>>> df 

           first second  1  2  3  10  11  12
buz    [1, 2, 3]      y  1  1  1   0   0   0
bas  [1, 10, 11]      n  1  0  0   1   1   0
bur  [2, 11, 12]      o  0  1  0   0   1   1

Amir saleem
  • 1,404
  • 1
  • 8
  • 11
0

Use pd.merge and pivot_table:

out = df.reset_index().explode('first') \
        .pivot_table(values='index', index='second', columns='first',
                     aggfunc='any', fill_value=False, sort=False).astype(int)

out = df.merge(out, on='second')

Output:

>>> out
         first second  1  2  3  10  11  12
0    [1, 2, 3]      y  1  1  1   0   0   0
1  [1, 10, 11]      n  1  0  0   1   1   0
2  [2, 11, 12]      o  0  1  0   0   1   1
Corralien
  • 109,409
  • 8
  • 28
  • 52