I am working on a primitive speech analysis algorithm. Now I want to improve how it handles negations of positive/negative statements. At the moment I add the string "NOT_" only if the negation directly occurs:
s_commentsOut$gsubContent <- gsub("not ","not NOT_",gsub("n't ","n't NOT_",s_commentsOut$lowCo))
So for example
"This is not good"
becomes
"This is not NOT_good"
Now I want to achieve that the "NOT_" is also added when there are n characters in between the vector of target words and the negation, e.g.:
targetList <- c("nice", "perfect", "good", "love")
Now with the help of the above list, the following string:
"This isn't a very good way"
should become
"This isn't a very NOT_good way"
This replacement should only take place if the negation occurs n
(for instance 15
) characters before the target, e.g. the following should not be converted (because the distance between the target and the negation is > 15
):
"This is not going to work. However you did this very nicely."
I found the following SO articles: Negation of several characters before pattern
How to replace a character in a string but only if it occurs within a delimited substring?
But I struggle to get it right. In the meantime I help myself with removing strings like "like ", "an ", "a " from the text...
Further Testphrases:
"Nottingham is the love of my life."
"This is good. Nottingham is a town."
"This is not very good"
"This is not good. This is not good. This is not very good. This is nice. This very nice. This is not very nice."