I asked a question and I received a great answer which solved my problem. However, I want to modify the code (here is my previous question).
finding similar strings in each row of two different data frame
I try to explain again the problem and how I tried to deal with it
The answer by Karsten W. gave me a normalised data (assign each string in each element a number of its position) as follow (I did not change it)
normalize <- function(x, delim) {
x <- gsub(")", "", x, fixed=TRUE)
x <- gsub("(", "", x, fixed=TRUE)
idx <- rep(seq_len(length(x)), times=nchar(gsub(sprintf("[^%s]",delim), "", as.character(x)))+1)
names <- unlist(strsplit(as.character(x), delim))
return(setNames(idx, names))
}
The second part was to apply the above function on each column separately, so if i need to do that on 1000 columns it is very time consuming. instead I do the following in comment , I tried to use lappy
# s1 <- normalize(df1[,1], ";")
# s2 <- normalize(df1[,2], ";")
I do like this
myS <- lapply(df1, normalize,";")
I keep the other part as it is
lookup <- normalize(df2[,1], ",")
Then to check between the two, I modified the function to only keep the row numbers of df2 (I removed (s[found] from it)
process <- function(s) {
lookup_try <- lookup[names(s)]
found <- which(!is.na(lookup_try))
pos <- lookup_try[names(s)[found]]
return(paste(pos, sep=""))
}
then whatever I do, I cannot get the output
process(myS$sample1)
...
At the end I need to have the data in a txt file or something which I can read. I used write.table
but this does not work.
Is there any better way to do this? How to do it automatically?