0

I have a Pandas python dataframe that has one column that is just a list of tags, somewhat like is shown below.

Index | User_Details | Tags
------|--------------|-------
0     | A            |[tag_a, tag_b]
1     | B            |-
2     | C            |[tag_a]
....  | ...          |....

This list column has an unknown, varying number of tags and users can have none, one or many of them. They are separated by commas. What I am trying to do is turn it into a boolean table like that shown below:

Index | User_Details | tag_a | tag_a
------|--------------|-------|-------
0     | A            |1      |1
1     | B            |0      |0
2     | C            |1      |0
....  | ...          |....   |...

I found some things on here that did this when the tags were limited and all known. Usually there were like only 3 tags, but I'm looking at up to 30ish.

Any ideas?

Thanks

NOTE: This is different to How to one-hot-encode from a pandas column containing a list? as some of my tag rows contain no data. Using any of the methods applied there usually results in a failure along lines of: TypeError: object of type 'float' has no len()

  • Can you print `df.head(5).to_dict()` and paste the output here? It is near impossible reproducing your data. – cs95 Oct 02 '17 at 21:03
  • For the tags section it is: {0: [], 1: [u'RSVP - Kickoff - XXXX San Diego - 09/20/2017'], 2: [u'RSVP - Kickoff - XXXX San Diego - 09/20/2017'], 3: [u'RSVP - Kickoff - XXXX San Diego - 09/20/2017'], 4: [u'RSVP - Kickoff - XXXX San Diego - 09/20/2017']} – Noah Nathan Oct 02 '17 at 21:59
  • Your question has already been answered, it is of no consequence anymore. – cs95 Oct 02 '17 at 22:00

0 Answers0