0

I have the following Pandas Dataframe.

data = pd.DataFrame(
{
    "client": ["first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eighth", "ninth", "tenth", "eleventh"],
    "Lifetime": [24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24],
    "Tokens": [30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30],
    "path": ["kyc", "co", "5dimes", "la", "la", "ku", "pv", "ipv", "lv", "7d", "222"],
    "requiredFields": [
        ['address', 'city', 'country', 'dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName', 'ssn', 'state', 'zip'],
        ['address', 'country', 'dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName', 'ssn', 'state', 'zip'],
        ['address', 'country', 'dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName', 'state', 'zip'],
        ['city', 'country', 'dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName', 'ssn', 'state', 'zip'],
        ['city', 'country', 'dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName', 'ssn', 'zip'],
        ['city', 'country', 'dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName', 'ssn'],
        ['city', 'country', 'dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName', 'state', 'zip'],
        ['country', 'dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName', 'ssn', 'state', 'zip'],
        ['country', 'dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName', 'ssn', 'zip'],
        ['country', 'dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName', 'state', 'zip'],
        ['dobDay', 'dobMonth', 'dobYear', 'firstName', 'lastName']
    ],
    "userIdRequired": [True, True, True, True, True, True, True, True, True, True, True],
}

) What I want to do is to make each item in the list go to a separate column. The result is a list item as a column name and its value "y". Something like this.

client Lifetime Tokens path requiredFields userIdRequired address city country dobDay dobMonth dobYear firstName lastName ssn state zip
first 24 30 kyc [address, city, country, dobDay, dobMonth, dobYear, firstName, lastName, ssn, state, zip] True y y y y y y y y y y y
second 24 30 co [address, city, country, dobDay, dobMonth, dobYear, firstName, lastName, ssn, state, zip] True y None y y y y y y y y y
third 24 30 5dimes [address, city, country, dobDay, dobMonth, dobYear, firstName, lastName, state, zip] True y y y y y y y y y y
fourth 24 30 la [city, country, dobDay, dobMonth, dobYear, firstName, lastName, ssn, state, zip] True None y y y y y y y y y y
fifth 24 30 la [city, country, dobDay, dobMonth, dobYear, firstName, lastName, ssn, zip] True None y y y y y y y y None y
sixth 24 30 ku [city, country, dobDay, dobMonth, dobYear, firstName, lastName, ssn] True None y y y y y y y y None None
seventh 24 30 pv [city, country, dobDay, dobMonth, dobYear, firstName, lastName, state, zip] True None y y y y y y y None y y
eighth 24 30 ipv [country, dobDay, dobMonth, dobYear, firstName, lastName, ssn, state, zip] True None None y y y y y y y y y
ninth 24 30 lv [country, dobDay, dobMonth, dobYear, firstName, lastName, ssn, zip] True None None y y y y y y y None y
tenth 24 30 7d [country, dobDay, dobMonth, dobYear, firstName, lastName, state, zip] True None None y y y y y y None y y
eleventh 24 30 222 [dobDay, dobMonth, dobYear, firstName, lastName] True None None None y y y y y None None None

I can't use apply pandas series or explode or something similar, because then I will have different value order by columns. I also tried to use but with this solution Pandas split a column of unequal length lists into multiple boolean columns, but it generates duplicated columns.

milevskyid
  • 123
  • 1
  • 1
  • 10
  • Does this answer your question? [Pandas convert a column of list to dummies](https://stackoverflow.com/questions/29034928/pandas-convert-a-column-of-list-to-dummies) – Ynjxsjmh Jun 21 '22 at 12:14
  • @Ynjxsjmh no, it is something I also tried and it does not generate what I expect. This solution gives me all list elements in the row as column names and with this, it generates as many columns as many different lists I will have. – milevskyid Jun 21 '22 at 12:57
  • [RESOLVED]: required_fields_s = data['requiredFields'].explode() data = data.join(pd.crosstab(required_fields_s.index,required_fields_s).replace({0: None, 1: 'y'})) – milevskyid Jun 23 '22 at 09:12

1 Answers1

0

You can do

out = data.join(data['requiredFields'].str[0].str.get_dummies(sep=', ').replace({0: None, 1: 'y'}))
print(out)

   client  Lifetime  Tokens    path  \
0   first        24      30     kyc
1  second        24      30      co
2   third        24      30  5dimes
3  fourth        24      30      la

                                          requiredFields  userIdRequired  \
0        [country, dobDay, dobMonth, dobYear, firstName]            True
1          [address, country, dobDay, dobMonth, dobYear]            True
2  [city, country, dobDay, dobMonth, dobYear, firstName]            True
3       [dobDay, dobMonth, dobYear, firstName, lastName]            True

  address  city country dobDay dobMonth dobYear firstName lastName
0    None  None       y      y        y       y         y     None
1       y  None       y      y        y       y      None     None
2    None     y       y      y        y       y         y     None
3    None  None    None      y        y       y         y        y
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • This solution does work for the data presented in the example, thank you. But it turns out, that the example data given here has differences from the original I am working with, and when I applied your solution to my original data it gave me problems - it does not show all the columns. I apologize, but I've already edited the example, I hope you can see if you can find a solution. – milevskyid Jun 22 '22 at 09:38