This is an interesting problem from the standpoint of the use of a regular expression to obtain the desired match, even though it may be preferred to convert the JSON string to a hash and go from there.
The following regular expression will match zero, one or two substrings. If there is at least one match the first match will be the one of interest. (If there are two matches disregard the second one.)
In the example given in the question I have assumed that the values of "name"
("free-aa-bb2-123x123Profiles"
and "free-aa-bb2-123x123Profiles"
) are placeholders for strings comprised of four substrings separated by hyphens in one case and underscores in the other, the substrings being comprised of word characters (letters, digits and underscored, represented by \w+
in regular expressions).
I have further assumed that the "hyphen" hash representation is the one of interest (and therefore is the first match) if there is no following "underscore" hash representation that is identical except for the hyphens being replaced by underscores; else the underscore hash representation is the only match. In the example the hash representation "free-aa-bb2-123x123Profiles"
would be selected. If, however, that string were changed to, say "zzzz-aa-bb2-123x123Profiles"
, "free-aa-bb2-123x123Profiles"
would be the first match so it would be selected.
Note that Zabbix uses the PCRE regex engine.
You can match the regular expression below, which I've written it in extended mode (invoked with the x
flag), sometimes called free-spacing mode. That mode allows one to enter comments to make the expression self-documenting, as well as extra whitespace to improve readability. In this mode the regex engine removes comments and whitespace before parsing the expression further it is necessary to protect any spaces that are part of the expression when extended mode is not used. That is generally done by placing spaces in a character class ([ ]
), which is what I have done below, or to escape spaces (\
).
I have also invoked single-line (or DOTALL) mode (invoked with the s
flag), which causes .
to match all characters (without doing so .
does not match line terminators). The regular expression is as follows.
\{\s+"name":[ ]" # match '{' then 1+ whitespace chars, then '"name": "
(\w+) # match 1+ word chars, save to capture group 1
- # match '-'
(\w+) # match 1+ word chars, save to capture group 2
- # match '-'
(\w+) # match 1+ word chars, save to capture group 3
- # match '-'
(\w+) # match 1+ word chars, save to capture group 4
( # begin capture group 5
",\s+"units":[ ] # match '",', then 1+ whitespaces then '"units": '
"\w+" # match 1+ word chars
,\s+"value":[ ] # match ',', then 1+ whitespaces then '"value": '
\d+ # match 1+ digits
\s+\} # match 1+ whitespaces then '"units": '
) # end capture group 5
(?! # begin negative lookahead
.* # match 0+ chars
\{\s+"name":[ ]" # match '{' then 1+ whitespace chars then '"name": ' then '"'
\1_ # match contents of capture group 1 then '_'
\2_ # match contents of capture group 2 then '_'
\3_ # match contents of capture group 3 then '_'
\4 # match contents of capture group 4
\5 # match contents of capture group 5
) # end of negative lookahead
| # or
\{\s+"name":[ ]" # match '{' then 1+ whitespace chars, then '"name": "
\w+ # match 1+ word chars
_ # match '_'
\w+ # match 1+ word chars
_ # match '_'
\w+ # match 1+ word chars
_ # match '_'
\w+ # match 1+ word chars
(?5) # execute code constructing capture group 6
Demo
In the above \1
(for example) requires that the contents of capture group 1 be matched at the current string location. By contrast, (?5)
directs that the code used to capture the contents of capture group 5 be invoked at the current string location. That is called a regex subroutine or subexpresson.