-2

I have imported a JSON dictionary of many (thousands) videos and different variables. One of the variables are the "tags", which are different for each observation.

e.g.

1 c("music", "guitar", "rock")
2 c("music", "diy", "recording")
3 c("hiking", "social")
4 tutorial

And I would like to add new columns to the dataframe that are dummy variables for presence or absence of a given tag

  music  guitar  rock  diy  recording  hiking  social tutorial
1   1       1     1     0       0        0       0       0
2   1       0     0     1       1        0       0       0
3   0       0     0     0       0        1       1       0
4   0       0     0     0       0        0       0       1

There are similar questions and answers, for example this one, but I am afraid the data structure and objectives are not the same.

I don't have strings that have delimiters by themselves (e.g. c("a,b,c", "c,d")), but the delimiters are only given by the c() function (and in some cases there is only one tag by itself, as in "tutorial"). At the same time, the possible tags are not known beforehand, and each observation contributes to possibly new columns to the dataframe.

Thanks in advance

Flexo
  • 87,323
  • 22
  • 191
  • 272
Kuku
  • 168
  • 1
  • 9
  • Please add some sample data using `dput()`. – tmfmnk Aug 19 '19 at 11:33
  • 1
    I don't have a complete answer for you, but you can use the following code to get vectors of all tokens for each observation `apply(abc,1,function(x){unlist(strsplit(x$tags,", "))})`. – user2974951 Aug 19 '19 at 12:03

1 Answers1

1

An option would be mtabulate

library(qdapTools)
cbind(df1, mtabulate(df1$tags))
akrun
  • 874,273
  • 37
  • 540
  • 662