I try to split a vector of strings into a data.frame object and for a fixed order this isn't a problem (e.g. like written here), but in my particular case the columns for the future data-frame are not complete in the string objects. This is how the output should look like for an toy input:
input <- c("an=1;bn=3;cn=45",
"bn=3.5;cn=76",
"an=2;dn=5")
res <- do.something(input)
> res
an bn cn dn
[1,] 1 3 45 NA
[2,] NA 3.5 76 NA
[3,] 2 NA NA 5
I am looking now for a function do.something
that can do that in a efficient way. My naive solution at the moment would be to loop over the input objects, strsplit
those for ;
then strsplit
them again for =
and then fill the data.frame
result by result.
Is there any way to do that more R-alike? I am afraid doing that element by element would take quite a long time for a long vector input
.
EDIT: Just for completeness, my naive solution looks like this:
do.something <- function(x){
temp <- strsplit(x,";")
temp2 <- sapply(temp,strsplit,"=")
ul.temp2 <- unlist(temp2)
label <- sort(unique(ul.temp2[seq(1,length(ul.temp2),2)]))
res <- data.frame(matrix(NA, nrow = length(x), ncol = length(label)))
colnames(res) <- label
for(i in 1:length(temp)){
for(j in 1:length(label)){
curInfo <- unlist(temp2[[i]])
if(sum(is.element(curInfo,label[j]))>0){
res[i,j] <- curInfo[which(curInfo==label[j])+1]
}
}
}
res
}
EDIT2: Unfortunately my large input data looks like this (entries without '=' possible):
input <- c("an=1;bn=3;cn=45",
"an;bn=3.5;cn=76",
"an=2;dn=5")
so I cannot compare the given answers to my problem at hand. My naive solution for that is
do.something <- function(x){
temp <- strsplit(x,";")
tempNames <- sort(unique(sapply(strsplit(unlist(temp),"="),"[",1)))
res <- data.frame(matrix(NA, nrow = length(x), ncol = length(tempNames)))
colnames(res) <- tempNames
for(i in 1:length(temp)){
curSplit <- strsplit(unlist(temp[[i]]),"=")
curNames <- sapply(curSplit,"[",1)
curValues <- sapply(curSplit,"[",2)
for(j in 1:length(tempNames)){
if(is.element(colnames(res)[j],curNames)){
res[i,j] <- curValues[curNames==colnames(res)[j]]
}
}
}
res
}