There are some helpful answers on here about using rm_between when each observation has only one instance of the markers. However I have a dataset where I want to extract things in ""'s and some of the observations have multiple instances of that. For example:
Fresh or chilled Atlantic salmon "Salmo salar" and Danube salmon "Hucho hucho"
When I use this code,
library(qdapRegex)
rf <- data.frame(rm_between_multiple(H2$SE_DESC_EN, c("\"", "\""), c("\"", "\"")))
it creates a data frame and for that same line earlier
"Fresh or chilled Atlantic salmon and Danube salmon"
is returned which is perfect. However I need the missing data. To try an retain it, I change my code slightly to:
H3 <- rm_between_multiple(H2$SE_DESC_EN, c("\"", "\""), c("\"", "\""), extract=TRUE)
to create a list with the data in the quotations. That same line returned is:
c("Salmo salar", " and Danube salmon ", "Hucho hucho",
"Salmo salar", " and Danube salmon ", "Hucho hucho")
Which has the data in quotations but also has some info in between the quotations and is being repeated. I'm fairly new at programming and was wondering if there is a way to write a code that will not included information between these quotations.