Assume a character vector like the following
file1_p1_analysed_samples.txt
file1_p1_raw_samples.txt
f2_file2_p1_analysed_samples.txt
f3_file3_p1_raw_samples.txt
Desired output:
file1_p1_analysed
file1_p1_raw
file2_p1_analysed
file3_p1_raw
I would like to compare the elements and remove parts of the string from start and end as much as possible but keep them unique.
The above one is just an example. The parts to be removed are not common to all elements. I need a general solution independent of the strings in the above example.
So far I have been able to chuck off parts that are common to all elements, provided the separator and the resulting split parts are of same length. Here is the function,
mf <- function(x,sep){
xsplit = strsplit(x,split = sep)
xdfm <- as.data.frame(do.call(rbind,xsplit))
res <- list()
for (i in 1:ncol(xdfm)){
if (!all(xdfm[,i] == xdfm[1,i])){
res[[length(res)+1]] <- as.character(xdfm[,i])
}
}
res <- as.data.frame(do.call(rbind,res))
res <- apply(res,2,function(x) paste(x,collapse="_"))
return(res)
}
Applying the above function:
a = c("a_samples.txt","b_samples.txt")
mf(a,"_")
V1 V2
"a" "b"
2.
> b = c("apple.fruit.txt","orange.fruit.txt")
> mf(b,sep = "\\.")
V1 V2
"apple" "orange"
If the resulting split parts are not same length, this doesn't work.