I have two datasets :
A 10*1 matrix containing names of countries :
countries<-structure(
c("usa", "canada", "france", "england", "brazil",
"spain", "germany", "italy", "belgium", "switzerland"),
.Dim = c(10L,1L))
And a 20*2 matrix containing 3-grams and ids of those 3-grams :
tri_grams<- structure(
c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"mo", "an", "ce", "ko", "we", "ge", "ma", "fi", "br", "ca",
"gi", "po", "ro", "ch", "ru", "tz", "il", "sp", "ai", "jo"),
.Dim = c(20L,2L),
.Dimnames = list(NULL, c("id", "triGram")))
I want to loop the countries and for each row get the tri_grams that exist in the country. For example in brazil there is "br" and "il". I want to get the information : (index of the country (double), id of tri-grams (char)). Therefore for brazil I wanna get : (5,"49") and (5,"25").
Here is the code with a simple loop :
res <- matrix(ncol=2,nrow=nrow(countries)*nrow(tri_grams))
colnames(res) <- c("indexCountry","idTriGram")
k <- 0
for(i in 1:nrow(countries))
{
for(j in 1:nrow(tri_grams))
{
if(grepl(tri_grams[j,2],countries[i,1])==TRUE)
{
k <- k+1
res[k,1] <- i
res[k,2] <- tri_grams[j,1]
}
}
}
res <- res[1:k,]
It works perfectly and here is the results :
indexCountry idTriGram
[1,] "2" "2"
[2,] "2" "10"
[3,] "3" "2"
[4,] "3" "3"
[5,] "4" "2"
[6,] "5" "9"
[7,] "5" "17"
[8,] "6" "18"
[9,] "6" "19"
[10,] "7" "2"
[11,] "7" "6"
[12,] "7" "7"
[13,] "9" "11"
[14,] "10" "2"
[15,] "10" "16"
I want to get the same result but using apply. I actually have a huge dataset, and this is just a sample of my real dataset. When I use the simple loop method on my real dataset it takes a very long time running (more than 10 hours). I tried to code it using apply but I didn't succeed.