extracting from text blocks with stringr

Question

I’m trying to use str_extract() from the stringr package to extract text from between square brackets using the pattern "(\\[){1}(.*)(\\]){1}". This works fine when the text between sets of brackets is separated by a new line (i.e. \n). Otherwise I get chunks of text that span multiple brackets.

So when:

my_text <- "[Sed ut perspiciatis]  [unde omnis iste natus] error sit architecto beatae vitae dicta sunt explicabo. \n [Nemo] sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt.  [consectetur], adipisci tempora incidunt ut \n [labore] et dolore magnam aliquam quaerat voluptatem. Ut consequatur, vel illum qui dolorem eum fugiat quo \n [voluptas nulla] pariatur?"

str_extract_all(my_text, "(\\[){1}(.*)(\\]){1}")

I get:

[[1]]
[1] "[Sed ut perspiciatis]  [unde omnis iste natus]"                                                      
[2] "[Nemo] sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt.  [consectetur]"
[3] "[labore]"                                                                                            
[4] "[voluptas nulla]"

while I would like to obtain:

[[1]]
[1] "[Sed ut perspiciatis] "
[2] "[unde omnis iste natus]"                                                      
[3] "[Nemo]" 
[4] "[consectetur]"
[5] "[labore]"                                                                                            
[6] "[voluptas nulla]"

How would I go about doing this?

Lazy dot `.*?` or `[^\\]\\[]*` negated character class will work. — Wiktor Stribiżew, Sep 04 '17 at 17:39
`unlist(regmatches(my_text, gregexpr("\\[[[:alpha:] ]+\\]", my_text)))` also seems to work. — lmo, Sep 04 '17 at 17:48

extracting from text blocks with stringr

0 Answers0