0

I do not want to use a simple_html_dom, how to use a php regular to get url part 1.html 2.html 3.html and text part 111 222 333? Thanks.

<p>items</p>
<div>
<ul>
<li><a href="1.html">111</a></li>
<li><a href="2.html">222</a></li>
<li><a href="3.html">333</a></li>
</ul>
</div>
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
cj333
  • 2,547
  • 20
  • 67
  • 110
  • 2
    Why don't you want to use a dom parser? It would be the right tool for the job. – Pekka Feb 25 '11 at 10:44
  • You mean regular expression ? – hsz Feb 25 '11 at 10:45
  • 2
    It's even listed on the index page of simple_html_dom website, under quick start. Did you even try to solve the problem yourself? – Andre Backlund Feb 25 '11 at 10:47
  • @hsz ,yes regular expression. – cj333 Feb 25 '11 at 10:47
  • 1
    @abloodywar, I have get the answer with simple_html_dom, but I still want to learn the regular expression. – cj333 Feb 25 '11 at 10:49
  • possible duplicate of [Best methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html) – RobertPitt Feb 25 '11 at 10:51
  • 3
    @cj333: If you want to **learn** the regular expressionS, then visit http://regular-expressions.info/ and http://stackoverflow.com/questions/89718/is-there-anything-like-regexbuddy-in-the-open-source-world - avoid writing `plzsendtehcodez` questions. – mario Feb 25 '11 at 10:55
  • HTML should not be parsed with regex. Ever. If you want to learn regular expressions, learn them on something appropriate. – Lightness Races in Orbit Feb 25 '11 at 10:59

1 Answers1

6

By PHP regular, I'm presuming you mean PERL regular expression.

preg_match_all('/<li><a href="([^"]+)">(.+?)<\/a><\/li>/', $html, $matches);

Then $matches[1] will have a list of the linked documents and $matches[2] will have the text.

Savetheinternet
  • 491
  • 4
  • 11