0

I'm using preg_match_all() like this:

preg_match_all('/("user":"(.*?)".*?-->)/', $input_lines, $output_array);

On a string, the idea is that I want it to get whatever comes after "user" in a commented out block (it's a long story). So let's say $input_lines is something like:

<!-- block {"user":"josh-hart", 1234566 ,"hide":false} /-->
<!-- block {"user":"jalen-brunson", 7744633 ,"hide":true} /-->
<!-- block {"user":"julius-randle", 333333,"hide":false} /-->
<!-- block {"user":"obi-toppin", 4hh3n33n33n, /-->
<!-- block {"user":"rj-barrett", nmremxxx!! ,"hide":true} /-->
<!-- block {"user":"mitch-robinson",yahaoao /-->

I want it to match the user. But here's the the thing, I only want the user if "hide":true does not appear before the /-->. So for this string I would want the matches to be:

josh-hart, julius-randle, obi-toppin, mitch-robinson

What is this called in regex terms and how do i do it?

pg.
  • 2,503
  • 4
  • 42
  • 67
  • 1
    The term you are looking for is "lookahead", https://www.regular-expressions.info/lookaround.html – CBroe Jun 21 '23 at 06:14
  • Here's an example: [`"user":"([^"]*?)"(?!(?:(?!\/-->).)*"hide"\s*:\s*true)`](https://regex101.com/r/BuF2Dk/1) – Hao Wu Jun 21 '23 at 06:52
  • 1
    You can also use [`(*SKIP)(*F)`](https://stackoverflow.com/questions/24534782/how-do-skip-or-f-work-on-regex) to skip something. Further [`\K` to reset](http://www.rexegg.com/regex-php.html#K) Something like e.g. [`"user":"\K[^"]+(?:[^>]*?"hide":true(*SKIP)(*F))?`](https://regex101.com/r/Q0wGeS/1) might suffice for your data. – bobble bubble Jun 21 '23 at 07:58

3 Answers3

2

You may try that:

user":"([^"]+)"(?!.*"hide":true(?=.*\/-->))

Explanation:

  • user":"([^"]+)" - after matching user":" it looks for characters inside " until it reaches a ".
  • () ensures 1st capture group.
  • (?!.*"hide":true negative lookahead to figure there is no "hide":true after that.
  • (?=.*\/-->) positive lookahead to ensure that /--> is not preceded by "hide":true

Demo

Mustofa Rizwan
  • 10,215
  • 2
  • 28
  • 43
2

As suggested by @bobblebubble, you can use the (*SKIP)(*FAIL) combo to skip whatever you don't want to match:

^                     # Match at the start of a line
.*                    # anything
"hide"\s*:\s*true     # followed by '"hide":true'
.*                    # then anything until the end of the line
                      # (i.e. a line that contains '"hide":true')
(*SKIP)(*FAIL)        # which we tell the engine to skip entirely
|                     # or
"user"\s*:\s*"\K      # '"user":"', which we forfeit,
[^"]+                 # followed by 1 or more non-quote characters.

Try it on regex101.com.

InSync
  • 4,851
  • 4
  • 8
  • 30
2

Assuming that opening and closing a comment is from <!-- to --> and there is no other use of these in between, you can first get the matches out of the way that contain <!-- ... "hide":true ... --> without crossing the opening or closing of a comment.

Then you can get a single match of the username, still in between the comment markers and independent of the order of appearance.

<!--(?:(?!-->|"hide":).)*+"hide":true\b(?:(?!-->).)*/-->(*SKIP)(*F)|<!--(?:(?!-->|"user":).)*"user":"\K[^"]+(?="(?:(?!-->).)*-->)

The pattern matches:

  • <!-- Match literaly
  • (?:(?!-->|"hide":).)*+ Optionally repeat matching any character not directly followed by either --> or "hide": using a Tempered greedy token
  • "hide":true\b Match "hide":true followed by a word boundary to prevent a partial word match
  • (?:(?!-->).)*/--> Match until the closing -->
  • (*SKIP)(*F) Skip the current match
  • | Or
  • <!-- Match literally
  • (?:(?!-->|"user").)* Optionally repeat matching any character not directly followed by either --> or "user:
  • "user":"\K Match "user":" and forget what is matched so far
  • [^"]+ Match 1+ chars other than " (the username that you want to match)
  • (?="(?:(?!-->).)*-->) Assert --> to the right

Note that you can make the matching of the username more specific, as for now it matches 1 or more characters other than a double quote with [^"]+ which can also be a space or a newline. If you want to match only non whitespace characters except for a double quote, than you can change it to [^\s"]+

See a regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70