0

As an extension to my earlier post (Return number from string), I'm trying to extract letters and numbers from a string. But to complicate matters, the patterns might occur more than once. For instance, given this string:

string  <- "[{\"task\":\"T0\",\"value\":[{\"choice\":\"MPL\",\"answers\":{\"HWMN\":\"1\",\"WHCHSDFTHNMLSVSBL\":\"LFTSD\"},\"filters\":{}},{\"choice\":\"NL\",\"answers\":{\"HWMN\":\"1\",\"WHCHSDFTHNMLSVSBL\":\"LFTSD\"},\"filters\":{\"LKSLK\":\"NTLP\",\"PTTRN\":\"STRPS\"}}]}]"

I'd like to extract MPL (you'll see it occurs after choice\":\"), as well as 1 (you'll see it occurs after HWMN\":\"), and LFTSD (which occurs after WHCHSDFTHNMLSVSBL\":\").

I've managed to extract this information individually using:

sub(".*?choice\":\"(.*?)\",\"answers.*", "\\1", string)
sub(".*?HWMN\":\"(.*?)\",\"WHCHSDFTHNMLSVSBL.*", "\\1", string)
sub(".*?WHCHSDFTHNMLSVSBL\":\"(.*?)\"},\"filters.*", "\\1", string) 

But, this only works for the first occurrence. How do I search the string and return all occurrences to a list. The above example would therefore result in a list of 2. Similar to :

> output.list

  [[1]]
   [1] "MPL"
   [2] "1"
   [3] "LFTSD"

  [[2]]
   [1] "NL"
   [2] "1"
   [3] "LFTSD"

Ideas?

Community
  • 1
  • 1
Ross
  • 359
  • 2
  • 11
  • Have you tried `gsub` instead? – Pierre L Dec 13 '16 at 17:37
  • If you're getting data in json or some other common format, maybe find the R package designed to parse it instead of crudely reinventing that wheel in the form of a series of regexes. – Frank Dec 13 '16 at 17:37
  • A heroic combination of `gregexpr` and `regmatches` aling with some additional string / list manipulations could probably get you there, but I'd take Frank's advice. – lmo Dec 13 '16 at 17:38
  • 1
    Ahh, thank you! I didn't realise this was in json format (completely unfamiliar with it). – Ross Dec 13 '16 at 17:44

0 Answers0