I’m trying to use str_extract()
from the stringr package to extract text from between square brackets using the pattern "(\\[){1}(.*)(\\]){1}"
. This works fine when the text between sets of brackets is separated by a new line (i.e. \n
). Otherwise I get chunks of text that span multiple brackets.
So when:
my_text <- "[Sed ut perspiciatis] [unde omnis iste natus] error sit architecto beatae vitae dicta sunt explicabo. \n [Nemo] sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. [consectetur], adipisci tempora incidunt ut \n [labore] et dolore magnam aliquam quaerat voluptatem. Ut consequatur, vel illum qui dolorem eum fugiat quo \n [voluptas nulla] pariatur?"
str_extract_all(my_text, "(\\[){1}(.*)(\\]){1}")
I get:
[[1]]
[1] "[Sed ut perspiciatis] [unde omnis iste natus]"
[2] "[Nemo] sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. [consectetur]"
[3] "[labore]"
[4] "[voluptas nulla]"
while I would like to obtain:
[[1]]
[1] "[Sed ut perspiciatis] "
[2] "[unde omnis iste natus]"
[3] "[Nemo]"
[4] "[consectetur]"
[5] "[labore]"
[6] "[voluptas nulla]"
How would I go about doing this?