-1

Problem:

Trying to extract specific text from HTML code that is available to me through PHP.

HTML code:

<a href="/debatt/s-vill-ha-tioarig-skolplikt-och-farre-elever-i-klassen">
    <span class="number">2. </span>Skolplikt och färre elever i klassen
    <br />
    <span class="metadata">I går</span>
</a>

<a href="/sthlm/edholm-backar-om-skolornas-smorforbud">
    <span class="number">3. </span>Edholm backar om skolornas smörförbud
    <br />
    <span class="metadata">16 okt</span>
</a>

Desired output:

2. Skolplikt och färre elever i klassen
3. Edholm backar om skolornas smörförbud

Both code examples have the same HTML structure. Is it possible through Simple HTML DOM to do this or should regular expressions be pursued?

kexxcream
  • 5,873
  • 8
  • 43
  • 62
  • 1
    Simple HTML DOm can do this, you can find readymade class (simplehtmldom) in the net to parse HTML and get desired texts from the DOM – WatsMyName Oct 18 '12 at 08:59

2 Answers2

2

Add the HTML into a DOMElement object. With it you can select children and extract their HTML/text into variables.

Docs: http://php.net/manual/en/class.domelement.php


Same answer as https://stackoverflow.com/a/12950525/711129

Community
  • 1
  • 1
Hidde
  • 11,493
  • 8
  • 43
  • 68
1

If you have to frequently do this, you can use a very handy and easy class for parsing html dom.

http://simplehtmldom.sourceforge.net/

WatsMyName
  • 4,240
  • 5
  • 42
  • 73