Extract pattern that occurs multiple times within string

Question

As an extension to my earlier post (Return number from string), I'm trying to extract letters and numbers from a string. But to complicate matters, the patterns might occur more than once. For instance, given this string:

string  <- "[{\"task\":\"T0\",\"value\":[{\"choice\":\"MPL\",\"answers\":{\"HWMN\":\"1\",\"WHCHSDFTHNMLSVSBL\":\"LFTSD\"},\"filters\":{}},{\"choice\":\"NL\",\"answers\":{\"HWMN\":\"1\",\"WHCHSDFTHNMLSVSBL\":\"LFTSD\"},\"filters\":{\"LKSLK\":\"NTLP\",\"PTTRN\":\"STRPS\"}}]}]"

I'd like to extract MPL (you'll see it occurs after choice\":\"), as well as 1 (you'll see it occurs after HWMN\":\"), and LFTSD (which occurs after WHCHSDFTHNMLSVSBL\":\").

I've managed to extract this information individually using:

sub(".*?choice\":\"(.*?)\",\"answers.*", "\\1", string)
sub(".*?HWMN\":\"(.*?)\",\"WHCHSDFTHNMLSVSBL.*", "\\1", string)
sub(".*?WHCHSDFTHNMLSVSBL\":\"(.*?)\"},\"filters.*", "\\1", string)

But, this only works for the first occurrence. How do I search the string and return all occurrences to a list. The above example would therefore result in a list of 2. Similar to :

> output.list

  [[1]]
   [1] "MPL"
   [2] "1"
   [3] "LFTSD"

  [[2]]
   [1] "NL"
   [2] "1"
   [3] "LFTSD"

Ideas?

If you're getting data in json or some other common format, maybe find the R package designed to parse it instead of crudely reinventing that wheel in the form of a series of regexes. — Frank, Dec 13 '16 at 17:37
A heroic combination of `gregexpr` and `regmatches` aling with some additional string / list manipulations could probably get you there, but I'd take Frank's advice. — lmo, Dec 13 '16 at 17:38
Ahh, thank you! I didn't realise this was in json format (completely unfamiliar with it). — Ross, Dec 13 '16 at 17:44

Extract pattern that occurs multiple times within string

0 Answers0