3

Lets say I have the following string:

Some crap string here...(TRACK "title1" F (S #h88 (P #m6) (P #m31)) (S #k3 (P #m58) (P #m58)))(TRACK "title2" P (S #a54 (P #r8)) (S #v59 (P #a25) (P #y82)))...Some other crap string here

Out of this string I need to extract to following data:

  1. title1
  2. F
  3. (S #h88 (P #m6) (P #m31)) and (S #k3 (P #m58) (P #m58))

and

  1. title2
  2. P
  3. (S #a54 (P #r8)) and (S #v59 (P #a25) (P #y82))

where

  1. is some kind of title.
  2. is some kind of status.
  3. is some kind of list of lists, like (S #xx (P #xx)).

Having limited regex knowledge, I can get 1 and 2, but only get the first part of 3.
(S #xx (P #xx)) can exist multiple times and also the inner (P #xx) can exist multiple times.

I've tried many regex expression and consulted a lot of posts, but I keep having troubles getting the data out as requested.

So now I'm back at \(TRACK "(.*?)" ([P|F]) (\(S.*?\)\)) which only captures the first of two lists in this example string.

see: https://regex101.com/r/FM0ZZR/1

What do I need to do to get all lists as described?

DigiLive
  • 1,093
  • 1
  • 11
  • 28

1 Answers1

2

You can use

\(TRACK\s+"([^"]*)"\s+([PF])((?:\s+(\([SP](?:[^()]*+|(?-1))*\)))*\))

See the regex demo.

Details

  • \(TRACK - a (TRACK substring
  • \s+ - one or more whitespaces
  • " - a " char
  • ([^"]*) - Group 1: any zero or more chars other than "
  • " - a " char
  • \s+ - one or more whitespaces
  • ([PF]) - Group 2: P or F
  • ((?:\s+(\([SP](?:[^()]*+|(?-1))*\)))*\)) - Group 3:
    • (?:\s+(\([SP](?:[^()]*+|(?-1))*\)))* - zero or more repetitions of
      • \s+ - one or more whitespaces
      • (\([SP](?:[^()]*+|(?-1))*\)) - Group 4 (technical, necessary for recursion):
        • \( - a ( char
        • [SP] - S or P
        • (?:[^()]*+|(?-1))* - zero or more chars other than ( and ) or the whole most recently captured pattern
        • \) - a ) char
    • \) - a ) char.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you very much, but unfortunately, this regex does not capture the first (S #xx (P #xx)) of each match.E.g. (S #h88 (P #m6) (P #m31)) – DigiLive Apr 14 '21 at 15:17
  • 1
    @DigiLive It does capture it, but since it is a [repeated capturing group](https://stackoverflow.com/questions/37003623/how-to-capture-multiple-repeated-groups), only the last value is kept in the group memory buffer. You cannot access each separate capture using a PCRE pattern as it has no support for a capturing group stack. You could write it like [this](https://regex101.com/r/PusvWB/4), `(?:\G(?!\A)|\(TRACK\s+"([^"]*)"\s+([PF]))\s+(\([SP](?:[^()]*+|(?-1))*\))\)?`, but it will not be quite usable, or will require quite a lot of extra code. – Wiktor Stribiżew Apr 14 '21 at 15:22
  • Cool recursing the subpattern with `(?-1)`, those patterns are awesome ++ – The fourth bird Apr 14 '21 at 16:27