I assume the keys and values consist of just alphanumerics and underscores.
You may skip the descriptions with the SKIP-FAIL PCRE construct and only match the key=value pairs that are at the beginning of a line with
(?m)\/\*[^*]*\*+([^\/*][^*]*\*+)*\/(*SKIP)(*F)|^\s*(\w+)\s*=\s*(\w+)
See the regex demo
The regex matches:
\/\*[^*]*\*+([^\/*][^*]*\*+)*\/(*SKIP)(*F)
- matches a multiline comment (this pattern is written with the unroll-the-loop techique and is quite efficient) and makes the regex engine discard the matched text and move the index to the end of this matched text (thus, we ignore the descriptions)
|
- or...
^\s*(\w+)\s*=\s*(\w+)
- ^
matches the start of a line, then we match and capture into Group 1 (the key) one one or more word characters (with (\w+)
), then just match zero or more whitespaces (\s*
) followed with =
, again zero or more whitespace symbols and then we capture into Group 2 (the value) one or more word characters.
The (?sm)
are inline modifiers, you can just write them as '~pattern-here~sm'
. The s
is a DOTALL modifier making .
match a newline. The m
is a MULTILINE modifier making ^
and $
match the beginning and end of a line, not the whole string.
A variation for a more complex case when the keys and values can consist of any characters and the value trailing boundary is ;
+newline/end of string:
(?sm)\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(*SKIP)(*F)|^\s*([^=\n]+?)\s*=\s*(.*?);\h*(?:$|\r?\n)
See another demo
IDEONE demo:
$re = '~/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(*SKIP)(*F)|^\s*([^=\n]+?)\s*=\s*(.*?);\h*(?:$|\r?\n)~sm';
$str = "/*\nanything description\n*/\n\nData1 = value1;\n\nOtherData<> = Other Value;\n\n/*\nmy other description\n*/\n\nAny thing = Any \nAnswer;\n\n/*\n\nthis is description and must not detect\n\nDescription_Data = Any_Value;\n\n*/";
preg_match_all($re, $str, $matches);
print_r($matches[1]);
print_r($matches[2]);
Output:
Array
(
[0] => Data1
[1] => OtherData<>
[2] => Any thing
)
Array
(
[0] => value1
[1] => Other Value
[2] => Any
Answer
)
To also ignore full single-line comments (lines starting with #
, ;
or //
), you may add the ^\h*(?:\/\/|[#;])[^\n]*
alternative to SKIP-FAIL part:
(?sm)(?:^\h*(?:\/\/|[#;])[^\n]*|\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/)(*SKIP)(*F)|^\s*([^=\n]+?)\s*=\s*(.*?);\h*(?:$|\r?\n)
See yet another regex demo. The ^\h*(?:\/\/|[#;])[^\n]*
matches the start of a line (with ^
), then either //
, #
or ;
and then zero or more characters other than newline (add \r
if you have Mac OS line endings).