0

I'm trying to apply the grabl function of stringdist to a large character vector "testref". I want to check for whether the strings in another character vector "testtitle" can be found in "testref". However, grabl does only allow for a single string to be tested at a time.

How can I circumvent this limitation?

Example to reproduce

#in reality each of the elements contains a full bibliography of a scientific article
testref <- c("asdfd sfgdgags dgsd.dsfas.dfs.f.sfas.f My beatiful title asfsdf dsf asfd dsf dsfsdfdsfsd, fdsf sdfdf: fsd fsdfafsd (2000) dsdfsf sfda", "sdfasfdsd, sdfsddf, fsagsg: sfds sfasdf sdfsdf", "sadfsdf: sdfsdf sdfggsdg another title here sdfdfsds, asdgasg (2021) blablabal")  

#the pattern vector can contain up to 500 titles of scientific articles that contain typos or formatting mistakes. Hence, I need to use approximate matching
testtitle <- c("holy cow", "random notes", "MI beautiful title", "quantitative research is hard", "an0ther title here")


What I want to get out of this is a list of logical TRUE/FALSE vectors

results_list
#[[1]]
#[1] FALSE FALSE FALSE 

#[[2]]
#[1] FALSE FALSE FALSE

#[[3]]
#[1] TRUE FALSE FALSE

#[[4]]
#[1] FALSE FALSE FALSE

#[[5]]
#[1] FALSE FALSE TRUE

So far I, I tried to loop the process as per @Rui Barradas suggestion. Technically it works, but it takes a very long time.

results_list <- vector("list", length = 5)
for(i in 1:5) {
  results_list[[i]] <- grabl(testref, testtitle[i], maxDist = 8)
}

I was wondering whether it is possible to use lapply in combination with the grabl function.

results_list <- lapply(testtitle, function(testtitle) grabl(testref, testtitle[], maxDist = 2))

But I get this error: Error in grabl(testref, testtitle[], maxDist = 2) : could not find function "grabl"

I'm very grateful for your past suggestions and hope for more input!

Thank you!

Jonas
  • 1
  • 1
  • What is the expected result? I think we can infer enough contents for 3 rows of `x` and 10 rows of `ref_year2002` to create 1-column frames (I think that's enough), but what are you hoping to get as a result from this? Please provide a literal object with real values in it that match this sample data. Thanks! – r2evans Aug 11 '22 at 13:15
  • Thank you for the reply! I am hoping to get a get a out vector for each title I'm testing that I can bind together to receive a matching matrix. My aim is to find where a title appears in the references of another title, it's an inter-citation matrix. I was trying to provide an example, but R always gives me the following error: unexpected symbol in: "al Tat pathway (1999) J. Biol. Chem., 274, pp. 13223-13228; Sanders, C., Wethkamp, N., Lill, H., Transport of cytochrome c derivatives by the bacterial Tat protein translocation system (2001) tablex <- c("Angelini" – Jonas Aug 12 '22 at 07:21
  • *"Please provide a literal object with real values in it that match this sample data."* – r2evans Aug 12 '22 at 11:28
  • Thank you for checking in again. I just updated my question and hope that this is a reproducible example :) – Jonas Aug 12 '22 at 13:40
  • (1) I still see no expected output values. (2) Regardless, the error `could not fund function "grabl"` is a duplicate of https://stackoverflow.com/q/7027288/3358272. Try either leading (once) with `library(stringdist)` or using `stringdist::grabl(..)` in place of `grabl(..)`. – r2evans Aug 12 '22 at 16:16

1 Answers1

0

Something like the following might do what you want. Untested, since there is no data.

# create a list to hold the results beforehand
results_list <- vector("list", length = 126)
for(i in 1:126) {
  results_list[[i]] <- grabl(year2002$References, ref_year2002$Title[i], maxDist = 8))
}
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66