0

I am trying to extract all occurrences of the expressions between "[XX]\n-----\n" and "\n-", for all XX. Here is the code I have come up with.

temp <- "\\[\\d+]\n-(.*)\n-"

x <- c("[65]\n-----\n this?\n-, some other rubbish, [3]\n-----\n that?\n-") 
str_extract_all(x,temp)

the output is a zero-dim character vector. The desired output is a two-dim character vector with entries "this" and "that".

Any help would be greatly appreciated!

I have tried many rephrasing's of the regular expression. The problem is in the positive lookbehind, but I can't figure it out. I've tried finding related articles, but no luck

*edited code typo

tobyink
  • 13,478
  • 1
  • 23
  • 35
Carmine
  • 1
  • 2
  • 1
    Why ``/n``? It should be `\n`. The real issue is the `.` that does not match newlines, add `(?s)` at the start. See [this demo](https://ideone.com/nMyxZI). – Wiktor Stribiżew May 23 '23 at 07:10
  • Thanks, yes it should, ill edit. Same out put though, I think I changes that somewhere along the way on accident. It's fully reproducible in that example if you're able to find a solution! – Carmine May 23 '23 at 07:32
  • I was rushing, just seeing the newline part of the comment, that did the job. In my original code I use lookforward/back to trim out the end phrases (e.g. temp <- "(?<=\\[\\d+]\n-)(.*)(?=\n-")), but it says the \d+ argument must have maximum bound, any idea how to get around that? Thanks for your comment! – Carmine May 23 '23 at 07:40
  • Yes, do not use any lookbehind, use `str_match_all` and just capture the part you are interested in. – Wiktor Stribiżew May 23 '23 at 07:46

0 Answers0