There are a few problems with your regex.
First of all, as FrankeTheKneeMan pointed out, you need delimiters. #
is a good choice for HTML matches (the standard choice is /
but that interferes with tags too often):
'#[/*]\s*record\s*specific_number[.]specific_string1[.]specific_string2\s*[*/].*[/*]\s*record_end\s*specific_number[.]specific_string1[.]specific_string2\s*[*/]#'
Now while [.]
is a nice way of escaping a single character, it doesn't work the same for [/*]
. This is a character class, that matches either /
or *
. Same for [*/]
. Use this instead:
'#/[*]\s*record\s*specific_number[.]specific_string1[.]specific_string2\s*[*]/.*/[*]\s*record_end\s*specific_number[.]specific_string1[.]specific_string2\s*[*]/#'
Now .*
is the remaining problem. Actually there are too, one is critical, the other might not be. The first is that .
does not match line breaks by default. You can change this by using the s
(singleline) modifier. The second is, that *
is greedy. Should a section appear twice in the string, you would get everything from the first corresponding /* record
to the last corresponding /* record_end
, even if there is unrelated stuff in between. Since your records seem to be very specific, I suppose this is not the case. But still it is generally good practice, to make the quantifier ungreedy, so that it consumes as little as possible. Here is your final regex string:
'#/[*]\s*record\s*specific_number[.]specific_string1[.]specific_string2\s*[*]/.*?/[*]\s*record_end\s*specific_number[.]specific_string1[.]specific_string2\s*[*]/#s'
For your presented example, this is
'#/[*]\s*record\s*863[.]content[.]en\s*[*]/.*?/[*]\s*record_end\s*863[.]content[.]en\s*[*]/#s'
If you want to find all of these sections, then you can make 863
, content
and en
variable, capture them (using parentheses) and use a backreference to make sure you get the corresponding record_end
:
'#/[*]\s*record\s*(\d+)[.](\w+)[.](\w+)\s*[*]/.*?/[*]\s*record_end\s*\1[.]\2[.]\3\s*[*]/#s'