0

I need a PHP regular expression pattern to select separately all lists <ul></ul> from string. The string is like:

Lorem ipsum dolor sit amet,...
<ul>
  <li>Item 1</li>
  <li>Item 2</li>
  <li>Item 3</li>
</ul>
Lorem ipsum dolor sit amet,...
<ul>
  <li>Item 1</li>
  <li>Item 2</li>
  <li>Item 3</li>
</ul>
....

I need to extract both lists and save them in array, so the result would look like:

$listsarray[0] = first list code from <ul> to </ul>.
$listsarray[1] = second list code, etc..

What I have tried, but this doesn't work as expected. If there is more than two lists, it selects first two as one (I don't know why, I'm a novice at regular expressions):

$content = 'the content like above...';
$pattern = '/<ul[^.]*<\/ul>/';
preg_match_all($pattern, $content, $listsarray)
wzazza
  • 803
  • 2
  • 11
  • 19

2 Answers2

3

Don't use regular expressions to parse HTML, it's a bad idea as HTML is not a regular language... You can use other methods such as tidy or the built in DOMDocument to parse it easily without regular expressions

If you insist, what you're looking for is reluctant matching (instead of greedy)

change * to *?

See this post about the difference and this one on why it's a bad idea to try and parse html with regular expressions

Community
  • 1
  • 1
Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
  • 1
    A better solution for parsing HTML is Tidy. It parses even broken HTML, lets you manipulate the tree structure and can output perfectly valid (X)HTML. – geon Jul 06 '12 at 09:34
  • 1
    Great idea, I'll edit the answer to include better alternatives to regular expressions – Benjamin Gruenbaum Jul 06 '12 at 09:36
  • @BenjaminGruenbaum Thank you very much *? solved all my problems. – wzazza Jul 06 '12 at 09:44
  • @BenjaminGruenbaum Hello, I guess you was right that parsing HTML with regular expression is bad idea, something is not working as expected, so I would like to ask: can you give some example how to do the same (similar) task with Tidy or DOMDocument, I have no experience with these and couldn't find some script example online. Can you help please? – wzazza Jul 09 '12 at 12:10
  • if you check the documentation at php.net there are plenty of examples there regarding usage – Benjamin Gruenbaum Jul 09 '12 at 15:49
0

use this:

<ul>(?<ulContent>.*?)</ul>

and get group named ulContent

Ria
  • 10,237
  • 3
  • 33
  • 60