Can somebody please explain me what this regex means?
#<hr(.*)class="system-pagebreak"(.*)\/>#iU
Is there a tool to convert these regular expresions to normal words?
Can somebody please explain me what this regex means?
#<hr(.*)class="system-pagebreak"(.*)\/>#iU
Is there a tool to convert these regular expresions to normal words?
It is attempting* to match any <hr>
tags that have class="system-pagebreak"
attributes.
The (.*)
segments between hr
and class
and the closing />
match "zero or more characters", so it can match things like
<hr id="what" class="system-pagebreak" style="display:block" />
The #iU
at the end make it case-insensitive (i
) and ungreedy (U
) so that the .*
matches won't eat up the whole document.
Is there a tool to convert these regular expresions to normal words?
Not really? What can you mean by "normal words"? That's a very straight forward regex, and you can't "convert" it to anything else without losing its meaning. There are plenty of sites for testing regular expressions though, such as Regex101.
*Note that I say attempting because this is a really bad way of attempting to interact with (X)HTML, and is sure to break eventually. You should use a DOM-parser.
This regex matches any self-closing hr with class "sytem-pagebreak", but not with additional classes.
the "actual" regex is the part between #
the iU
behind that is two "flags" specifying, how the regex will behave. the i
means that the regex will be case-insensitive, the U
means that the regex qualifiers are lazy by default.
the first part of the regex (<hr
) will be evaluated as a String literal. it matches any combination like:
- <hr
- <Hr
- <hR
- <HR
then follows a group evaluation (marked by the ()
). Evaluated will be the special char .
(any character) that will be matched as many times as it goes.
then follows a literal string evaluation for class="system-pagebreak"
. This will not match things like these:
after that there is again any char as often as it comes and then a literal match for />
. The backslash is just for escaping the slash from the regex (as it is also a special char).
It will match <hr>
tags with class="system-pagebreak"
attribute. It will also capture anything between hr and class and between the second quotation mark and the end of the tag (/>
). / escapes the slash. i makes it insensitive and U ungreedy. The pound (#) signs mark the beginning and end of the pattern.
Is there a tool to convert these regular expresions to normal words?
You can use a tool like www.regexper.com to visualize the regex: http://www.regexper.com/#%23%3Chr(.)class%3D%22system-pagebreak%22(.)%5C%2F%3E%23 This helps understandig it.
Can somebody please explain me what this regex means?
There are already enough good answers :)
This regex will match all characters on the same line after <hr
until class="system-pagebreak"
will be met, and put it in the first capturing group. And then, it will put all characters (always on the same line) in the capturing group 2 until />
The goal is probably to find self closing hr tags that contains the class system-pagebreak
. However it's a bad pattern since it will match too this kind of string:
<hr><div class="system-pagebreak"><img src="image.jpg" />