I am using rm_between (from qdapregex) trying to extract text from the sentence below (text is highlighted in bold just to clarify the question, in the original dataset, all text are the same. trying to extract based on the location between 2 specified strings).
need to extract:
\nInterpretations\nThere is increased acid, along with significant correlation with node. consistent with ber. \neSigned by KMN MA 6/1/2020;data;reports;
or extract this
\nInterpretations\nThere is increased acid, along with significant correlation with node. consistent with ber. \neSigned by KMN MA 6/1/2020 ;data;reports;
I tried the following codes but they keep returning NA
any suggestions ? I would prefer to use the same package (as I already extracted other phrases from the same dataset with it), but willing to try others if you suggest
x$Impression2 = rm_between(x$nam, "Interpretations\\n", ";data", extract=TRUE)
x$Impression2 = rm_between(x$nam, "Interpretations\\n", "data;reports", extract=TRUE)
x$Impression2 = rm_between(x$nam, "Interpretations\\n", "[[:digit:]];data", extract=TRUE)
x$Impression2 = rm_between(x$nam, "Interpretations\\n", "\\d;data", extract=TRUE)
x$Impression2 = rm_between(x$nam, "Interpretations\\n", "\\d;data", fixed = FALSE, extract=TRUE)
x$Impression2 = rm_between_multiple(x$nam, "Interpretations\\n", "[ ]{2,}", extract=TRUE)
I think the problem is having new line in the text I want to extract, so I guess I can change all newlines to space and then extract (using this for example remove all line breaks (enter symbols) from the string using R) but I prefer to keep newlines if possible, any suggestion is highly appreciated
thank you