I'm trying to apply the grabl function of stringdist to a large character vector "testref". I want to check for whether the strings in another character vector "testtitle" can be found in "testref". However, grabl does only allow for a single string to be tested at a time.
How can I circumvent this limitation?
Example to reproduce
#in reality each of the elements contains a full bibliography of a scientific article
testref <- c("asdfd sfgdgags dgsd.dsfas.dfs.f.sfas.f My beatiful title asfsdf dsf asfd dsf dsfsdfdsfsd, fdsf sdfdf: fsd fsdfafsd (2000) dsdfsf sfda", "sdfasfdsd, sdfsddf, fsagsg: sfds sfasdf sdfsdf", "sadfsdf: sdfsdf sdfggsdg another title here sdfdfsds, asdgasg (2021) blablabal")
#the pattern vector can contain up to 500 titles of scientific articles that contain typos or formatting mistakes. Hence, I need to use approximate matching
testtitle <- c("holy cow", "random notes", "MI beautiful title", "quantitative research is hard", "an0ther title here")
What I want to get out of this is a list of logical TRUE/FALSE vectors
results_list
#[[1]]
#[1] FALSE FALSE FALSE
#[[2]]
#[1] FALSE FALSE FALSE
#[[3]]
#[1] TRUE FALSE FALSE
#[[4]]
#[1] FALSE FALSE FALSE
#[[5]]
#[1] FALSE FALSE TRUE
So far I, I tried to loop the process as per @Rui Barradas suggestion. Technically it works, but it takes a very long time.
results_list <- vector("list", length = 5)
for(i in 1:5) {
results_list[[i]] <- grabl(testref, testtitle[i], maxDist = 8)
}
I was wondering whether it is possible to use lapply in combination with the grabl function.
results_list <- lapply(testtitle, function(testtitle) grabl(testref, testtitle[], maxDist = 2))
But I get this error: Error in grabl(testref, testtitle[], maxDist = 2) : could not find function "grabl"
I'm very grateful for your past suggestions and hope for more input!
Thank you!