1

Here in my html string($str) contains links and their titles/names inside span.I want to extract view.php?id=123,view.php?id=124 and their names galaxy and galaxy2.

Can any one help me extract the link and its name inside span ?I tried following but i get no data!Thanks in advance.

$str="...............<span class="title" ><a href="view.php?id=123" class="title"><strong>galaxy</strong></a></span>............<span class="title" style=background:#000000><a href="watch.php?id=124" class="title"><strong>galaxy2</strong></a></span>";

if(preg_match_all('/\<span class="title" ><a href=(.*?)\<\/strong>/',$str,$match)) 
{             
    echo "<br>href:".$match[1][0];
    echo "<br>";
    echo "title:"
}

str sample data:

<div class="profile cleaning" id="contentlist">
<div class="profile-item ">
<div class="img" data-preview="view.php?id=123">
<img src="./logos/123.jpg" width="240" height="140" alt="">
</div>
<span class="title" style=background:#000000><a href="view.php?id=123" class="title"><strong>Galaxy 1</strong></a></span>
</div><div class="profile-item ">
<div class="img" data-preview="view.php?id=124">
<img src="./logos/124.jpg" width="240" height="140" alt="">
</div>
<span class="title" style=background:#000000><a href="view.php?id=124" class="title"><strong>Galaxy 2</strong></a></span>
</div><div class="profile-item ">
<div class="img" data-preview="view.php?id=125">
<img src="./logos/125.png" width="240" height="140" alt="">
</div>
<span class="title" style=background:#000000><a href="view.php?id=125" class="title"><strong>Galaxy 3</strong></a></span>
</div><div class="profile-item " style="background:#000000;border:1px solid #326EE0;">
<div class="img" data-preview="view.php?id=126">

<div style="position: relative; left: 0; top: 0;vertical-align:top">
<img src="./logos/126.png" style="border: none;padding:1px;border:2px solid #326EE0;margin:0px;margin-bottom:2px;width:240px;position: relative; top: 0; left: 0; " >
<img src="images/mango.png" style="width:240px;position: absolute; top: 0px; left: 0px;"/>
</div>

</div>
<span class="title" ><a href="view.php?id=126" class="title"><strong>Galaxy 4</strong></a></span>
</div><div class="profile-item " style="background:#000000;border:1px solid #326EE0;">
<div class="img" data-preview="view.php?id=127">

<div style="position: relative; left: 0; top: 0;vertical-align:top">
<img src="./logos/127.jpg" style="border: none;padding:1px;border:2px solid #326EE0;margin:0px;margin-bottom:2px;width:240px;position: relative; top: 0; left: 0; " >
<img src="images/mango.png" style="width:240px;position: absolute; top: 0px; left: 0px;"/>
</div>

</div>
<span class="title" ><a href="view.php?id=127" class="title"><strong>Galaxy 5</strong></a></span>
</div><div class="profile-item " style="background:#000000;border:1px solid #326EE0;">
<div class="img" data-preview="view.php?id=128">

<div style="position: relative; left: 0; top: 0;vertical-align:top">
<img src="./logos/128.jpg" style="border: none;padding:1px;border:2px solid #326EE0;margin:0px;margin-bottom:2px;width:240px;position: relative; top: 0; left: 0; " >
<img src="images/mango.png" style="width:240px;position: absolute; top: 0px; left: 0px;"/>
</div>

</div>
<span class="title" ><a href="view.php?id=128" class="title"><strong>Galaxy 6</strong></a></span>
</div></div>
user1788736
  • 2,727
  • 20
  • 66
  • 110
  • 1
    Possible duplicate of [How do you parse and process HTML/XML in PHP?](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) – chris85 Dec 25 '15 at 15:19
  • The HTML is malformed! And are the dots really included in the html? – Jan Dec 25 '15 at 16:04
  • Its a big html . I added those dots to mention that html codes continue. – user1788736 Dec 25 '15 at 16:08

2 Answers2

1

You can use a function for that.

$str='zxcvbnm<a href="http://www.example.com">zxcv</a>qwertyuiop<span class="title" ><a href="view.php?id=123" class="title"><strong>galaxy</strong></a></span>asdfghjkl<span class="title" style=background:#000000><a href="watch.php?id=124" class="title"><strong>galaxy2</strong></a></span>';

function parse_hrefANDname($str) {
    if (strpos($str, '<span class="title"') === false) return false;

    $line = substr($line, strpos($line, '<a href=')+8);

    $res = array();

    $str_arr = explode('<a href=', $str);
    foreach ($str_arr as $k => $line) {
        if ($k == 0) continue;

        $href_quote = substr($line, 0, 1); // some writes href="", some href=''
        $href_val = substr($line, 1);
        $href_val = substr($href_val, 0, strpos($href_val, $href_quote));

        $name = substr($line, strpos($line, '<strong>') + 8);
        $name = substr($name, 0, strpos($name, '</strong>'));

        $res[$k - 1]['href'] = $href_val;
        $res[$k - 1]['name'] = $name;
    }

    return $res;
}

$arr = parse_hrefANDname($str);
print_r($arr);
ashazar
  • 714
  • 5
  • 11
  • Thanks for the code. I tried it and it prints out all the href urls found in my html string! The html str that i receive via curl has lots of href and i am only interested in href that are inside sample span! – user1788736 Dec 25 '15 at 16:06
  • I edited the code. It should work for your needs.But please note that your span should be like 'span class="title". (class="title" should be right after span) – ashazar Dec 25 '15 at 20:02
  • Thanks alot your previous code worked as well after i trimed the str data so i get the html chunk i needed! – user1788736 Dec 26 '15 at 00:48
0

You can use SimpleXML to do that. Elements and attributes can be accessed via an array-like syntax, better not use some regex for this purpose:

$str = '<container><span class="title" ><a href="view.php?id=123" class="title"><strong>galaxy</strong></a></span></container>';
$xml = simplexml_load_string($str);
echo $xml->span->a["href"]; // view.php?id=123
echo $xml->span->a->strong; // galaxy

So for your situation (having multiple spans that is):

<?php
$str='<container>
    <span class="title">
        <a href="view.php?id=123" class="title"><strong>galaxy</strong></a>
    </span>
    <span class="title" style="background:#000000">
        <a href="watch.php?id=124" class="title"><strong>galaxy2</strong></a>
    </span>
 </container>';
$xml = simplexml_load_string($str);
foreach ($xml->span as $span) {
    echo "Link: " . $span->a["href"] . "<br/>";
    echo "Content: " . $span->a->strong->__toString();
} 
?>

Hint: I made the container tag up, it is likely to be html or xml in your case. Additionally, I had to correct the markup (adding double quotes).

Jan
  • 42,290
  • 8
  • 54
  • 79
  • Thanks for reply i am fetching the html data using php curl($str= get_data('http://somesite.php?id=11');) that has no container tag and there are many occurrence span sets in data received and i want all those links and their titles! i tried your code but i got these error:E_WARNING : type 2 -- simplexml_load_string(): Entity: line 1: parser error : StartTag: invalid element name -- at line 2. How i can fix it ? – user1788736 Dec 25 '15 at 15:37
  • Do you have a link to the original source? – Jan Dec 25 '15 at 15:38