2

Hello this is my string

/*
anything description
*/

Data1 = value1;

Other_Data = Other_Value;

/*
my other description
*/

Anything = Any_Answer;

/*

this is description and must not detect

Description_Data = Any_Value;

*/

now i want to use regex and get something like this

Data1
Other_Data
Anything

and

value1
Other_Value
Any_Answer

in array but i do not want regex detect anything inside of (description box)

/* */
like
Description_Data = Any_Value;

this is my regex

\h*(.*?)\h*[=]\h*(.*?)\h*[;]

my problem is that regex get all the keys and value even in description and in some keys, get everything before the key like all description before key ... i want to get just like this

Data1
Other_Data
Anything

and

value1
Other_Value
Any_Answer

what is the problem?

MyJustWorking
  • 117
  • 2
  • 8

1 Answers1

2

I assume the keys and values consist of just alphanumerics and underscores.

You may skip the descriptions with the SKIP-FAIL PCRE construct and only match the key=value pairs that are at the beginning of a line with

(?m)\/\*[^*]*\*+([^\/*][^*]*\*+)*\/(*SKIP)(*F)|^\s*(\w+)\s*=\s*(\w+)

See the regex demo

The regex matches:

  • \/\*[^*]*\*+([^\/*][^*]*\*+)*\/(*SKIP)(*F) - matches a multiline comment (this pattern is written with the unroll-the-loop techique and is quite efficient) and makes the regex engine discard the matched text and move the index to the end of this matched text (thus, we ignore the descriptions)
  • | - or...
  • ^\s*(\w+)\s*=\s*(\w+) - ^ matches the start of a line, then we match and capture into Group 1 (the key) one one or more word characters (with (\w+)), then just match zero or more whitespaces (\s*) followed with =, again zero or more whitespace symbols and then we capture into Group 2 (the value) one or more word characters.

The (?sm) are inline modifiers, you can just write them as '~pattern-here~sm'. The s is a DOTALL modifier making . match a newline. The m is a MULTILINE modifier making ^ and $ match the beginning and end of a line, not the whole string.

A variation for a more complex case when the keys and values can consist of any characters and the value trailing boundary is ;+newline/end of string:

(?sm)\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(*SKIP)(*F)|^\s*([^=\n]+?)\s*=\s*(.*?);\h*(?:$|\r?\n)

See another demo

IDEONE demo:

$re = '~/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(*SKIP)(*F)|^\s*([^=\n]+?)\s*=\s*(.*?);\h*(?:$|\r?\n)~sm'; 
$str = "/*\nanything description\n*/\n\nData1 = value1;\n\nOtherData<> = Other Value;\n\n/*\nmy other description\n*/\n\nAny thing = Any \nAnswer;\n\n/*\n\nthis is description and must not detect\n\nDescription_Data = Any_Value;\n\n*/"; 
preg_match_all($re, $str, $matches);
print_r($matches[1]);
print_r($matches[2]);

Output:

Array
(
    [0] => Data1
    [1] => OtherData<>
    [2] => Any thing
)
Array
(
    [0] => value1
    [1] => Other Value
    [2] => Any 
Answer
)

To also ignore full single-line comments (lines starting with #, ; or //), you may add the ^\h*(?:\/\/|[#;])[^\n]* alternative to SKIP-FAIL part:

(?sm)(?:^\h*(?:\/\/|[#;])[^\n]*|\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/)(*SKIP)(*F)|^\s*([^=\n]+?)\s*=\s*(.*?);\h*(?:$|\r?\n)

See yet another regex demo. The ^\h*(?:\/\/|[#;])[^\n]* matches the start of a line (with ^), then either //, # or ; and then zero or more characters other than newline (add \r if you have Mac OS line endings).

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Can you explain more simple ? – MyJustWorking Feb 16 '16 at 20:46
  • The regexps skip all `/* ... */` and only grab keys and values into Group 1 and 2. Simple enough? :) – Wiktor Stribiżew Feb 16 '16 at 20:47
  • Can you say to me what should i do for # and // and ; (other description words) in one line ? – MyJustWorking Feb 16 '16 at 20:54
  • Do you mean you want to also skip single-line comments? If they occupy the whole line, just add the `^\h*(?:\/\/|[#;])[^\n]*` to the SKIP-FAIL part: [`(?sm)(?:^\h*(?:\/\/|[#;])[^\n]*|\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/)(*SKIP)(*F)|^\s*([^=\n]+?)\s*=\s*(.*?);\h*(?:$|\r?\n)`](https://regex101.com/r/wI6iK3/5) – Wiktor Stribiżew Feb 16 '16 at 21:00
  • what is (?sm) in regex? – MyJustWorking Feb 16 '16 at 21:08
  • These are inline modifiers, you can just write them as `'~pattern-here~sm'`. `s` is a DOTALL modifier making `.` match a newline, too. `m` is a MULTILINE modifier making `^` and `$` match the beginning and end of a *line*, not the whole string. – Wiktor Stribiżew Feb 16 '16 at 21:10