1

This REGEX is not working as I would like it to...

/<ol class=\"references\">\/(.*?)<\/ol>/s

I'm using it in conjunction with preg_match in PHP.

preg_match_all("/<ol class=\"references\">\/(.*?)<\/ol>/s", $file, $mats);

When I put it in http://regex101.com/, it seems like the issue is that it wants the string desired to match/parse to be <ol class="references">/text here</ol>?

It states: \/ matches the character / literally

However, I want the snippet of REGEX and PHP to parse <ol class="references">text here</ol>

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • 3
    This is why you [shouldn't use regexes for parsing HTML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – John Conde May 14 '14 at 02:03
  • For particular reasons of my own I am using regex. – user3634795 May 14 '14 at 02:07
  • I am aware of DOM parsers. :) Thank you. – user3634795 May 14 '14 at 02:08
  • @JohnConde This is why it is ok to parse HTML with regex: http://stackoverflow.com/a/1733489/764357 –  May 14 '14 at 02:09
  • Thanks @LegoStormtroopr. I am looking for something lightweight, I am parsing lightweight text, which regex suits well. I had posted by question as to receive answers complying with my specification. Furthermore, I could be doing an assignment? I've heard this mantra about regex and HTML many times. – user3634795 May 14 '14 at 02:10
  • If you are aware why you are using your regex? You are just reinventing the wheel. – Imat May 14 '14 at 02:10
  • Whether or not regex suits my needs is not my question, though thank you for the advice. Regex is efficient and pithy to code, and it works, especially for my situation. I find immediately posting "oh regex is awful for parsing html" unnecessary, its a mantra I've heard many times, and it doesn't take into account the actual implementation the OP is using. I'm interested in why the problem in the question I mentioned is happening, not whether or not regex suits my situation. I could be studying CS theory for all you know. – user3634795 May 14 '14 at 02:13
  • @lmat there are legitimate reasons to use Regex to search through HTML. If the OP needs just the text in the `ol.class` tag, and knows there won't be other class values, then regex is suitable without the expense of parsing the whole DOM. The OP has a reason, and this isn't an unreasonable task for regex. –  May 14 '14 at 02:13
  • Thanks, I agree with you, I don't need to import a whole DOM parsing library and having to parse the whole DOM. – user3634795 May 14 '14 at 02:17

1 Answers1

1

That is correct \/ matches a literal slash, so your pattern can't match on:

<ol class="references">text here</ol>

As the expression requires a literal / after the first >. Just remove that and it should work as required:

<ol class=\"references\">(.*?)<\/ol>

If there are occasionally slashes (/) in the inner text of the element, that you don't want to capture, you can do a quantified match - ? - like so:

<ol class=\"references\">\/?(.*?)<\/ol>
  • Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '(' in /home/content/76/10008776/html/stuck.php on line 10 – user3634795 May 14 '14 at 02:18
  • Or Warning: preg_match_all() [function.preg-match-all]: Compilation failed: unmatched parentheses at offset 28 in /home/content/76/10008776/html/stuck.php on line 10 – user3634795 May 14 '14 at 02:18
  • Actually, no thery're the right way around. And the regex is valid according to regex101. Was it copied correctly? –  May 14 '14 at 02:23
  • ya, i did copy it correctly. it did say valid, however in php I get warnings – user3634795 May 14 '14 at 15:03