I have a data frame that has a bunch of data that's joined with commas in certain elements of the rows. Something that looks like:
df <- data.frame(
c(2012,2012,2012,2013,2013,2013,2014,2014,2014)
,c("a,b,c","d,e,f","a,c,d,c","a,a,a","b","c,a,d","g","a,b,e","g,h,i")
)
names(df) <- c("year", "type")
I want to get it in a form that dcast
is close to getting it to, with the year,a,b,c,etc being the columns, and the frequency across the data frame being in the cells of the resultant data frame. I tried first to do colsplit
on df
and then use dcast
after, but that seems to only work if I want to aggregate on one of the levels instead of all.
df2 <- data.frame( df$year, colsplit(df$type, ',' , c('v1','v2','v3','v4','v5')) )
df3 <- dcast(df2, df.year ~ v1)
This result only gives me for the first level of the colsplit
, instead of all of them. Am I close to a solution or should I be using a different approach entirely?