Splitting a list or vector of strings into dummy columns in R

Question

I have imported a JSON dictionary of many (thousands) videos and different variables. One of the variables are the "tags", which are different for each observation.

e.g.

1 c("music", "guitar", "rock")
2 c("music", "diy", "recording")
3 c("hiking", "social")
4 tutorial

And I would like to add new columns to the dataframe that are dummy variables for presence or absence of a given tag

  music  guitar  rock  diy  recording  hiking  social tutorial
1   1       1     1     0       0        0       0       0
2   1       0     0     1       1        0       0       0
3   0       0     0     0       0        1       1       0
4   0       0     0     0       0        0       0       1

There are similar questions and answers, for example this one, but I am afraid the data structure and objectives are not the same.

I don't have strings that have delimiters by themselves (e.g. c("a,b,c", "c,d")), but the delimiters are only given by the c() function (and in some cases there is only one tag by itself, as in "tutorial"). At the same time, the possible tags are not known beforehand, and each observation contributes to possibly new columns to the dataframe.

Thanks in advance

I don't have a complete answer for you, but you can use the following code to get vectors of all tokens for each observation `apply(abc,1,function(x){unlist(strsplit(x$tags,", "))})`. — user2974951, Aug 19 '19 at 12:03

score 1 · Answer 1 · answered Aug 19 '19 at 17:40

1

An option would be mtabulate

library(qdapTools)
cbind(df1, mtabulate(df1$tags))

answered Aug 19 '19 at 17:40

akrun

874,273
37
540
662

Splitting a list or vector of strings into dummy columns in R

1 Answers1