I am using R to prepare some data for a D3 visualization. The visualization was created using the following structure (this is a single row from a .csv file that is subsequently converted to JSON in javascript).
Joe.Schmoe, joe.schmoe@email.com, Sao Paulo, ["Community01", "Community02", "Community03"],
["workgroup01","workgroup02"]
This is a single row. The headers would be:
Person, Email, Location, Communities, Workgroups
You'll notice that the Communities and Workgroup columns contain lists. Furthermore, these lists will vary in length depending on what Communities and Workgroups each individual is associated with. I recognize that this is probably not best practice with regard to data "tidyness," but it is what this viz is expecting.
So ... in R (which I'm learning), I'm finding it impossible to recreate this structure because, when I try to populate the "communities" or "workgroups" variables, R seems to expect that each variable will be of equal length.
The code that I have is reading from a data.frame which is list of the members of a particular community, and adding the name of that community to a column in a master data.frame of all employees. I'm indexing by email address because it is unique. So this particular loop looks at each individual email address in a data.frame called "commTD" and finds it in a master data.frame called "testr." If it finds it, it looks at the communities variable and either replaces an NA value with the name of the community (in this case "Technical Design"), or if the vector already exists, appends Technical Design to it:
for(i in commTD$email){
if(i %in% testr$email){
tmpList <- testr[which(testr$email ==i) , 'communities']
if(is.na(tmpList)){
tmpList <- list(c("Technical Design"))
}
else{
tmpList <- append(tmpList[[1]][1], 'Technical Design')
}
testr[which(testr$email ==i) , 'communities'] <- list(tmpList)
}
}
This works fine for the initial replacement, but if I append a new community to the list, and then try to pass it back into the testr data.frame, I get an error:
Error in `[<-.data.frame`(`*tmp*`, which(testr$email == i), "communities",
: replacement has 2 rows, data has 1
You'll note that I'm trying to create a list of vectors, which is just one way I've tried to figure this out. I thought maybe I could force R to see the list as a single object, even though it contains multiple items -- or in this case a vector of multiple items.
Is this just impossible in R, to have varied length vectors or lists as a single variable in a data frame?