0

Im making a script to get other pages content, and right now im working on a function that should get tag content... but im a bit stuck :D

found a new tag of same kind inside tag...
nothing found...
1111
2222

is printed.

<?php

function d($toprint)
{
    echo $toprint."<br />";
}

function GetTagContents($source, $tag, $pos)
{   
    $startTagPos        = strpos( $source, "<".$tag, $pos );
    $startTagEndPos     = strpos( $source, ">", $startTagPos )+1;

    $endTagPos          = strpos( $source, "</".$tag, $startTagEndPos);

    $lastpos = $startTagPos+1;    
    while( $lastpos != False )
    {
        $newStartTagPos = strpos( $source, "<".$tag, $lastpos );

        if( $newStartTagPos == False )
        {
            d("nothing found...");
            $lastpos = False;        
        }
        else if( $newStartTagPos > $endTagPos )
        {
            d("out of bounds...");
            $lastpos = False;
        }
        else
        {
            d("found a new tag of same kind inside tag...");
            $lastpos =  $newStartTagPos+1;       
            $endTagPos  = strpos( $source, "</".$tag, $newStartTagPos);
        }
    }

    return substr($source, $startTagEndPos, $endTagPos-$startTagEndPos);
}
?>
<html>

    <body>
    <?php

    d(GetTagContents('<div>1111<div>2222</div>3333</div>', "div", 0));

    ?>
    </body>

</html>

someone got any ideas?

Jason94
  • 13,320
  • 37
  • 106
  • 184
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 /me to the rescue – zerkms Feb 03 '11 at 13:01
  • Btw, if you really want to parse manually - you could start from reading http://en.wikipedia.org/wiki/Finite-state_machine – zerkms Feb 03 '11 at 13:02
  • 1
    you can use the simplexml class for your reason – AmirModiri Feb 03 '11 at 13:03

3 Answers3

2

Using PHP DOM:

$src = new DOMDocument('1.0', 'utf-8');
$src->formatOutput = true;
$src->preserveWhiteSpace = false;
$src->load('path/to/file.html');

$tagName = 'foo';
$element = $src->getElementsByTagName($tagName)->item(0);
var_dump($element->nodValue)
Richard Knop
  • 81,041
  • 149
  • 392
  • 552
  • after a lot of warnings NULL was dumped... :( I want to load pages from other domains, would it still work? – Jason94 Feb 03 '11 at 14:31
  • @Jason94 It should work but the HTML/XHTML must be valid. Use HTML Purifier to clean the HTML before sending it to PHP DOM. – Richard Knop Feb 03 '11 at 15:06
  • @Jason94 Could you give me a link to an example HTML website you want to parse? I will try to play with it when I come home later today to make sure it works. – Richard Knop Feb 04 '11 at 07:15
0

strpos will return 0 the first time, and 0 == false in PHP. The check you want is to compare the result with ===, which evaluates to true if both values are the same value and the same type. That is, 0 == false is true but 0 === false is not true.

Tim Martin
  • 3,618
  • 6
  • 32
  • 43
0

you can use this

simplexml_load_string

$xml = "[div]1111[div]2222[/div]3333[/div]";

$loadStrring = simplexml_load_string($xml);
foreach($loadStrring->children() as $name => $data) {
    if($name ='div')
        echo $data . "\n";
    }
}

bensiu
  • 24,660
  • 56
  • 77
  • 117
AmirModiri
  • 755
  • 1
  • 5
  • 13