0

I currently have a dataframe in which for each row, there is a column of comma separated text ordered somewhat randomly (top_categories) that is then separated out into individual columns (Category 1, Category 2, etc) based on their placement within the top_categories column.

How would I be able to get to my desired output, where each of the categories (Category A, Category B, etc) would be their own column (let's say ordered alphabetically), with a 1 or a 0 to indicate whether or not that specific category exists in that row's top_categories column?

Any help would be appreciated! (Additionally, I linked sample data in the hyper links above for what my data currently looks like (tab 1), and what I'm trying to achieve (tab 2). Thanks!

S.Zhong
  • 81
  • 7
  • e.g. use last paragraph of this [solution](https://stackoverflow.com/a/39358924). – jezrael Nov 12 '20 at 11:19
  • @jezrael Hi jezrael! Just looked through the linked solution you gave, and it doesn't seem to be the same as what I'm trying to do (at least to me). What I'm trying to do isn't simply splitting a column of delimited values of different lengths, but to have each of those column values become their own separate columns, with a 1 or 0 to indicate whether or not it applies to the original column value (see linked sample data). – S.Zhong Nov 12 '20 at 11:38
  • 1
    oops, you are right, changed dupe link. – jezrael Nov 12 '20 at 11:40
  • 1
    @jezrael oh yea, wow, this is perfect, thank you! – S.Zhong Nov 12 '20 at 11:44

0 Answers0