How to use preg_match() to extract specific data using PHP

Question

Possible Duplicate:
How to parse and process HTML with PHP?

Problem:

Only extract the first < ul class="list">< /ul> from a webpage using preg_match and dump it into an array.

Code:

$str = file_get_contents('http://www.domain.com');
preg_match('#<ul class="list">(.*)</ul>#i', $str, $matches);

Desired goal:

To get the first < ul> and dump it all in an array. < ul> should be parent and every element inside should be child.

score 0 · Answer 1 · answered Oct 17 '12 at 18:03

preg_match is a string manipulation function, and knows nothing of "child elements", so will never be able to return the array you are hoping for.

You need to use a library capable of parsing the HTML for you, such as Simple HTML DOM or the built-in DOM library's loadHTML method.

[Edit - The "never" above is a slight exaggeration: you could, with a bit of effort, write your own mini-parser using nothing but preg_match, but it would be inflexible and unmaintainable compared to using an HTML parsing library.]

CodeAngry · Answer 2 · 2012-10-17T18:47:35.653

$str = file_get_contents('http://www.domain.com');
preg_match('~<ul class="list">(.*?)</ul>~si', $str, $matches);

Use .? to match first and closest closing tag. If you use . it will find first on the page. I assume your UL tag is correct.

You also need si flags as in s = single-line and i = insensitive.
Otherwise your pattern breaks on first \n.

PS: If your UL contains UL children, you should consider parsing using the DOMDocument and an DOMXPath query. It's safer for more complex HTML.

Hope it helps.

score 0 · Answer 3 · answered Oct 17 '12 at 18:05

you want to use .+? or you may grab more than just the first ul if there are several.

preg_match( '/<ul class="list">(.+?)<\/ul>/mis', $str, $match );

preg_match_all( '/<li>(.+?)<\/li>/mis', $match[1], $lis );

$answer = array ( 'ul' => $lis[1] );

I think that is what you were looking for

How to use preg_match() to extract specific data using PHP

3 Answers3