-1

I want to take a string variable that has a lot of text in it, search until it finds a match "UpperBoundery" and then searches until it sees text after that upper boundary until it finds another match "LowerBoundery" then return to me the text that is between those two boundaries.

For example, the upper boundary would be ""Country":"" and the ending boundary would be "",".

This is a snip of what the text I'm dealing with looks like:

> }],"Country":"United States",
> }],"Country":"China",

So I want the results to come back:

> United States
> China

What code or function can people share with me to do this? I've been looking forever and tried numerious things (stri, grep, find, etc.) but I can't get anything to do what I'm looking for. Thank you for your help!

socialscientist
  • 3,759
  • 5
  • 23
  • 58
Steven
  • 150
  • 1
  • 2
  • 14
  • 3
    Your input looks like a JSON snippet. There are lots of nice tools to parse JSON so you don't have to roll your own regex. I'd suggest posting a more complete example... – Gregor Thomas Jul 28 '22 at 00:44
  • Does this answer your question? [Extracting a string between other two strings in R](https://stackoverflow.com/questions/39086400/extracting-a-string-between-other-two-strings-in-r) – socialscientist Jul 28 '22 at 00:55
  • Thanks Gregor for the suggestion, I'll look into it. Your answer below did solve my problem however. @Socialscientist, I did try the solution there as well, but my R didn't like it. It kept throwing me errors. I think it had to do with the quotation marks being part of the string that needed to be the lower boundary. So I moved on trying to find other solutions or explanation why it wasn't working. – Steven Jul 28 '22 at 02:08
  • In the future please then pose your question as why that solution does not work when certain types of data are passed to it - as it is now, this question should probably be marked as a duplicate and closed. – socialscientist Jul 28 '22 at 03:29

1 Answers1

1

Here's a regex method, though as I mentioned in comments I'd strongly recommend using, e.g., the jsonlite package instead.

# input:
x = c('> }],"Country":"United States",', 
'> }],"Country":"China",')

library(stringr)
result = str_extract(x, pattern = '(?<=Country":")[^,]+(?=",)')
result
# [1] "United States" "China" 

Explanation:

  • (?<=...) is the look-behind pattern. So we're looking behind (before) the match for Country":".
  • [^"]+ is our main pattern - ^ in brackets is "not", so we're looking for any character that is not a ". And + is the quantifier, so one or more non-" characters.
  • (?=...) is the look-ahead pattern. So we're looking after the match for ","
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thanks Gregor! This is a great explanation. I would rather use jsonlite if it has an easier method to use. R is still new to me, and using stringr was the first rabbit hole I found that was semi-close to what I was looking for. – Steven Jul 28 '22 at 01:52
  • jsonlite doesn't have an easier method for these little snippets you show, but if you start with a complete JSON object it might be as simple as `fromJSON(your_input)$country`. Impossible to know without seeing a more complete input. – Gregor Thomas Jul 28 '22 at 12:43