1

Nothing appears to be wrong with the expression. I've matched it to the sample HTML in several editors. But once I plug it into preg_match_all, I get no results.

Any ideas?

$regex_lists = '~<ul.*?>.+?</ul>~m';
preg_match_all($regex_lists, $html, $lists);

var_dump($lists); //empty array

Sample HTML

<ul type="disc">
<br><li class="MsoNormal" style="margin: 0in 0in 10pt; line-height: normal; mso-margin-      top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list   .5in;">
<span style='font-family: "Arial","sans-serif"; font-size: 12pt; mso-fareast-font-  family: "Times New Roman";'>Maintain complete knowledge of and comply with all departmental policies/service procedures/standards. <p></p></span>
<br>
</li>
<li class="MsoNormal" style="margin: 0in 0in 10pt; line-height: normal; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;">
<span style='font-family: "Arial","sans-serif"; font-size: 12pt; mso-fareast-font-  family: "Times New Roman";'>Maintain complete knowledge of correct maintenance and use of equipment. Use equipment only as intended. <p></p></span>
<br>
</li>
</ul>
Brandon Buster
  • 1,195
  • 1
  • 9
  • 12
  • 2
    You might want to look into this: http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php – jeroen Aug 20 '14 at 16:17
  • Thanks I primarily use the DOMDoc, SimpleXML and XPATH libs. In this instance I'm using regex because of inconsistent content and formatting. I'm using regex to break the DOM into pieces based on particular expressions, which would leave unclosed tags. Not sure if it's best practice, but it's quick and with anubhava's suggestion, I think I'm about 90% done with the task. – Brandon Buster Aug 20 '14 at 16:47
  • As long as you are aware of the options, you can always select the one that works best for your specific use-case :-) – jeroen Aug 20 '14 at 16:48

1 Answers1

2

Since your input has newlines as well you need s (DOTALL) flag to make dot match newlines:

$regex_lists = '~<ul.*?>.+?</ul>~is';

OR

$regex_lists = '~<ul[^>]*>.+?</ul>~is';

PS: Also m flag is not needed in your regex.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Great! That was it. I was using the m modifier thinking that was the "dot matches new lines" modifier. Thank you for pointing me to the correct usage. – Brandon Buster Aug 20 '14 at 16:41