My first time here so I hope I don't break anything... I have a list of lists:
Browse[2]> head(str(mylist))
List of 33
$ : chr [1:33] "0001" "space" "28" "night_club" ...
$ : chr [1:33] "0002" "concert" "28" "night_club" ...
$ : chr [1:31] "0003" "night_club" "24" "martial_arts" ...
$ : chr [1:31] "0004" "stage" "24" "basketball" ...
$ : chr [1:43] "0005" "night_club" "16" "concert" ...
$ : chr [1:43] "0006" "night_club" "16" "concert" ...
$ : chr [1:39] "0007" "night_club" "22" "concert" ...
$ : chr [1:39] "0008" "night_club" "22" "concert" ...
$ : chr [1:31] "0009" "night_club" "46" "martial_arts" ...
$ : chr [1:31] "0010" "night_club" "46" "martial_arts" ...
$ : chr [1:41] "0011" "night_club" "17" "martial_arts" ...
$ : chr [1:41] "0012" "night_club" "17" "martial_arts" ...
$ : chr [1:29] "0013" "concert" "23" "night_club" ...
$ : chr [1:29] "0014" "concert" "23" "night_club" ...
$ : chr [1:25] "0015" "night_club" "26" "concert" ...
$ : chr [1:31] "0016" "night_club" "42" "concert" ...
$ : chr [1:31] "0017" "night_club" "42" "concert" ...
$ : chr [1:31] "0018" "night_club" "25" "wrestling" ...
$ : chr [1:31] "0019" "night_club" "25" "wrestling" ...
$ : chr [1:33] "0020" "night_club" "46" "wrestling" ...
$ : chr [1:33] "0021" "night_club" "46" "wrestling" ...
$ : chr [1:41] "0022" "concert" "21" "stage" ...
$ : chr [1:41] "0023" "concert" "21" "stage" ...
$ : chr [1:55] "0024" "basketball" "8" "concert" ...
$ : chr [1:55] "0025" "basketball" "8" "concert" ...
$ : chr [1:37] "0026" "bald_person" "26" "martial_arts" ...
$ : chr [1:37] "0027" "bald_person" "26" "martial_arts" ...
$ : chr [1:37] "0028" "night_club" "32" "business_meeting" ...
$ : chr [1:37] "0029" "night_club" "32" "business_meeting" ...
$ : chr [1:15] "0030" "night_club" "59" "stage" ...
$ : chr [1:37] "0031" "stage" "12" "night_club" ...
$ : chr [1:37] "0032" "stage" "12" "night_club" ...
$ : chr [1:33] "0033" "night_club" "23" "portrait" ...
I want to turn this list into a wide format data frame where the first column would be each of every inner list first element (i.e. "0001", "0002" etc.) and there would be all possible columns with categories exist in the file: "space", "night_club", "concert", "marital_arts", "wrestling" etc. meaning that I would a very wide data frame that each row will begin with some id (0001,0002,0003 ...) and the columns names would be again all categories in the file: "space", "night_club", "concert", "marital_arts", "wrestling" etc. and for each row, where the category exists for that id, it would populate the value next to the category from the list ("space" -> 28 from the first line for example).
I was trying to construct a normalized data frame with loops and then convert it to a wide format, but as data scales it would be a bad idea:
for (file in files){# iterate over files in folder
mylist <- strsplit(readLines(file), ":")
#close(mylist)
for (elem in mylist){
dataframe <- data.frame(frameid = numeric(), category = character(), nrow = length(unlist(elem)))
frameid <- rep.int(elem[[1]], length(elem)-1)
categories <- elem[-1:-1]
dataframe$frameid <- frameid
dataframe$category <- categories
}
}
Reproducible input output example: dput of input:
list(c("0001", "space", "28", "night_club", "25"), c("0002",
"concert", "28", "night_club", "26"), c("0003", "night_club",
"24", "martial_arts", "27"), c("0004", "stage", "24", "basketball",
"30"))
output:
Dataframe
frameid, cat_space, cat_night_club, cat_concert, cat_martial_arts, cat_stage, cat_basketball
0001, 28, 25, 0, 0, 0, 0
0002, 0, 26, 28, 0, 0, 0
0003, 0, 24, 0, 27, 0, 0
0004, 0, 0, 0, 0, 24, 30