Regex Match Second occurrence if not First

Question

I have a json output like below in Zabbix

{
  "body": {
    "metricsArray": [
      {
        "name": "free-aa-bb2-123x123Profiles",
        "units": "profiles",
        "value": 14
      },
      {
        "name": "free_aa_bb2_123x123Profiles",
        "units": "profiles",
        "value": 14
      }
    ],
    "name": "regionxxx",
    "timeStamp": "2022-01-20T04:58:29.875Z"
  }
}

I was using this regex:

"free[_-]aa[_-]bb2[_-]123x123Profiles"[^}]*

hoping to get the output as

"free_aa_bb2_123x123Profiles","units":"profiles","value":14

if both free-aa-bb2-123x123Profiles and free_aa_bb2_123x123Profiles is present.
Or:

"free-aa-bb2-123x123Profiles","units":"profiles","value":14

if only free-aa-bb2-123x123Profiles is present.
Or:

"free_aa_bb2_123x123Profiles","units":"profiles","value":14

if free_aa_bb2_123x123Profiles is present.

But the output I was getting is always:

"free-aa-bb2-123x123Profiles","units":"profiles","value":14

TIA

I don't see a JSON parser option in Zabbix UI. All i have is a field in UI to put the regex or a javascript field to add js code. — Raghavendra, Mar 20 '23 at 07:58
Then use JS: [Safely turning a JSON string into an object](https://stackoverflow.com/q/45015). — InSync, Mar 20 '23 at 07:59
Sorry. I've never used JS in my life. I would have figured it out if it was in Python or BASH — Raghavendra, Mar 20 '23 at 08:29
JSONPath is supported in Zabbix since version 3.4 https://www.zabbix.com/documentation/6.0/en/manual/config/items/preprocessing/jsonpath_functionality?hl=JSONPath — Iron Bishop, Mar 20 '23 at 12:41
You probably need a regex in the form of something like this `"free[_-]aa[_-]bb2[_-]123x123Profiles"\s*,\s*"units"\s*:\s*"profiles"\s*,\s*"value"\s*:\s*14` https://regex101.com/r/qva4NP/1 — sln, Mar 20 '23 at 22:20

score 0 · Answer 1 · answered Mar 21 '23 at 00:09

This is an interesting problem from the standpoint of the use of a regular expression to obtain the desired match, even though it may be preferred to convert the JSON string to a hash and go from there.

The following regular expression will match zero, one or two substrings. If there is at least one match the first match will be the one of interest. (If there are two matches disregard the second one.)

In the example given in the question I have assumed that the values of "name" ("free-aa-bb2-123x123Profiles" and "free-aa-bb2-123x123Profiles") are placeholders for strings comprised of four substrings separated by hyphens in one case and underscores in the other, the substrings being comprised of word characters (letters, digits and underscored, represented by \w+ in regular expressions).

I have further assumed that the "hyphen" hash representation is the one of interest (and therefore is the first match) if there is no following "underscore" hash representation that is identical except for the hyphens being replaced by underscores; else the underscore hash representation is the only match. In the example the hash representation "free-aa-bb2-123x123Profiles" would be selected. If, however, that string were changed to, say "zzzz-aa-bb2-123x123Profiles", "free-aa-bb2-123x123Profiles" would be the first match so it would be selected.

Note that Zabbix uses the PCRE regex engine.

You can match the regular expression below, which I've written it in extended mode (invoked with the x flag), sometimes called free-spacing mode. That mode allows one to enter comments to make the expression self-documenting, as well as extra whitespace to improve readability. In this mode the regex engine removes comments and whitespace before parsing the expression further it is necessary to protect any spaces that are part of the expression when extended mode is not used. That is generally done by placing spaces in a character class ([ ]), which is what I have done below, or to escape spaces (\ ).

I have also invoked single-line (or DOTALL) mode (invoked with the s flag), which causes . to match all characters (without doing so . does not match line terminators). The regular expression is as follows.

\{\s+"name":[ ]"     # match '{' then 1+ whitespace chars, then '"name": "
(\w+)                # match 1+ word chars, save to capture group 1
-                    # match '-'
(\w+)                # match 1+ word chars, save to capture group 2
-                    # match '-'
(\w+)                # match 1+ word chars, save to capture group 3
-                    # match '-'
(\w+)                # match 1+ word chars, save to capture group 4
(                    # begin capture group 5
  ",\s+"units":[ ]   # match '",', then 1+ whitespaces then '"units": '
  "\w+"              # match 1+ word chars
  ,\s+"value":[ ]    # match ',', then 1+ whitespaces then '"value": '
  \d+                # match 1+ digits
  \s+\}              # match 1+ whitespaces then '"units": '
)                    # end capture group 5
(?!                  # begin negative lookahead
  .*                 # match 0+ chars
  \{\s+"name":[ ]"   # match '{' then 1+ whitespace chars then '"name": ' then '"'
  \1_                # match contents of capture group 1 then '_'
  \2_                # match contents of capture group 2 then '_'
  \3_                # match contents of capture group 3 then '_'
  \4                 # match contents of capture group 4
  \5                 # match contents of capture group 5
)                    # end of negative lookahead 
|                    # or
\{\s+"name":[ ]"     # match '{' then 1+ whitespace chars, then '"name": "
\w+                  # match 1+ word chars
_                    # match '_'
\w+                  # match 1+ word chars
_                    # match '_'
\w+                  # match 1+ word chars
_                    # match '_'
\w+                  # match 1+ word chars
(?5)                 # execute code constructing capture group 6

Demo

In the above \1 (for example) requires that the contents of capture group 1 be matched at the current string location. By contrast, (?5) directs that the code used to capture the contents of capture group 5 be invoked at the current string location. That is called a regex subroutine or subexpresson.

Regex Match Second occurrence if not First

1 Answers1