-2

Possible Duplicate:
How to parse and process HTML with PHP?

Problem:

Only extract the first < ul class="list">< /ul> from a webpage using preg_match and dump it into an array.

Code:

$str = file_get_contents('http://www.domain.com');
preg_match('#<ul class="list">(.*)</ul>#i', $str, $matches);

Desired goal:

To get the first < ul> and dump it all in an array. < ul> should be parent and every element inside should be child.

Community
  • 1
  • 1
kexxcream
  • 5,873
  • 8
  • 43
  • 62

3 Answers3

0

preg_match is a string manipulation function, and knows nothing of "child elements", so will never be able to return the array you are hoping for.

You need to use a library capable of parsing the HTML for you, such as Simple HTML DOM or the built-in DOM library's loadHTML method.

[Edit - The "never" above is a slight exaggeration: you could, with a bit of effort, write your own mini-parser using nothing but preg_match, but it would be inflexible and unmaintainable compared to using an HTML parsing library.]

IMSoP
  • 89,526
  • 13
  • 117
  • 169
0
$str = file_get_contents('http://www.domain.com');
preg_match('~<ul class="list">(.*?)</ul>~si', $str, $matches);

Use .? to match first and closest closing tag. If you use . it will find first on the page. I assume your UL tag is correct.

You also need si flags as in s = single-line and i = insensitive.
Otherwise your pattern breaks on first \n.

PS: If your UL contains UL children, you should consider parsing using the DOMDocument and an DOMXPath query. It's safer for more complex HTML.

Hope it helps.

CodeAngry
  • 12,760
  • 3
  • 50
  • 57
0

you want to use .+? or you may grab more than just the first ul if there are several.

preg_match( '/<ul class="list">(.+?)<\/ul>/mis', $str, $match );

preg_match_all( '/<li>(.+?)<\/li>/mis', $match[1], $lis );

$answer = array ( 'ul' => $lis[1] );

I think that is what you were looking for

James L.
  • 4,032
  • 1
  • 15
  • 15