I have texts written by doctors and I want to be able to highlight specific words in their context (5 words before and 5 words after the word I search for in their text). Say I want to search for the word 'suicidal'. I would then use the kwic function in the quanteda package:
kwic(dataset, pattern = “suicidal”, window = 5)
So far, so good, but say I want to allow for the possibility of typos. In this case I want to allow for three deviating characters, with no restriction on where in the word these are made.
Is it possible to do this using quanteda's kwic-function?
Example:
dataset <- data.frame("patient" = 1:9, "text" = c("On his first appointment, the patient was suicidal when he showed up in my office",
"On his first appointment, the patient was suicidaa when he showed up in my office",
"On his first appointment, the patient was suiciaaa when he showed up in my office",
"On his first appointment, the patient was suicaaal when he showed up in my office",
"On his first appointment, the patient was suiaaaal when he showed up in my office",
"On his first appointment, the patient was saacidal when he showed up in my office",
"On his first appointment, the patient was suaaadal when he showed up in my office",
"On his first appointment, the patient was icidal when he showed up in my office",
"On his first appointment, the patient was uicida when he showed up in my office"))
dataset$text <- as.character(dataset$text)
kwic(dataset$text, pattern = "suicidal", window = 5)
would only give me the first, correctly spelled, sentence.