Extract complex sentence using rm_between

Question

I am using rm_between (from qdapregex) trying to extract text from the sentence below (text is highlighted in bold just to clarify the question, in the original dataset, all text are the same. trying to extract based on the location between 2 specified strings).

need to extract:

\nInterpretations\nThere is increased acid, along with significant correlation with node. consistent with ber. \neSigned by KMN MA 6/1/2020;data;reports;

or extract this

\nInterpretations\nThere is increased acid, along with significant correlation with node. consistent with ber. \neSigned by KMN MA 6/1/2020 ;data;reports;

I tried the following codes but they keep returning NA

any suggestions ? I would prefer to use the same package (as I already extracted other phrases from the same dataset with it), but willing to try others if you suggest

x$Impression2 = rm_between(x$nam, "Interpretations\\n", ";data", extract=TRUE)

x$Impression2 = rm_between(x$nam, "Interpretations\\n", "data;reports", extract=TRUE)

x$Impression2 = rm_between(x$nam, "Interpretations\\n", "[[:digit:]];data", extract=TRUE)

x$Impression2 = rm_between(x$nam, "Interpretations\\n", "\\d;data", extract=TRUE)

x$Impression2 = rm_between(x$nam, "Interpretations\\n", "\\d;data", fixed = FALSE, extract=TRUE)

x$Impression2 = rm_between_multiple(x$nam, "Interpretations\\n", "[ ]{2,}", extract=TRUE)

I think the problem is having new line in the text I want to extract, so I guess I can change all newlines to space and then extract (using this for example remove all line breaks (enter symbols) from the string using R) but I prefer to keep newlines if possible, any suggestion is highly appreciated

thank you

I honestly do not know if there is such a rule (I am new to working with strings). I just highlighted in bold here in the question to specify the text of interest. I am working on a csv file, where all the text look the same (I clarified the question above). — Bahi8482, Jun 18 '20 at 13:32

Extract complex sentence using rm_between

0 Answers0