0

Here is my html file contains date and a link in <span> tag within a table. Can anyone help me find the link of a particular date. view link of particular date

<table>
<tbody>
<tr class="c0">
<td class="c11">
<td class="c8">
<ul class="c2 lst-kix_h6z8amo254ry-0 start">
<li class="c1">
<span>1st Apr 2014 - </span>
<span class="c6"><a class="c4" href="/link.html">View</a>
</span>
</li>
</ul>
</td>
</tr>
</td>
</table>

I want to retrieve the link for particular date

MY CODE IS LIKE THIS

include('simple_html_dom.php');    
$html = file_get_html('link.html');
//store the links in array
foreach($html->find('span') as $value)
{
    //echo $value->plaintext . '<br />';
    $date = $value->plaintext;

    if (strpos($date,$compare_text)) {
         //$linkeachday = $value->find('span[class=c1]')->href;
        //$day_url[] = $value->href;
        //$day_url = Array("text" => $value->plaintext);
        $day_url = Array("text" => $date, "link" =>$linkeachday);
        //echo $value->next_sibling (a);
    }
}

or

$spans = $html->find('table',0)->find('li')->find('span');
echo $spans;
 $num = null;
 foreach($spans as $span){
     if($span->plaintext == $compare_text){
        $next_span = $span->next_sibling();
        $num = $next_span->plaintext;
         echo($num);    
        break; 
     }
 }
 echo($num);
Jenz
  • 8,280
  • 7
  • 44
  • 77
Lipsa
  • 397
  • 3
  • 7
  • 26
  • 1
    Did you try the DomDocument class? – toesslab Apr 16 '14 at 06:31
  • Wow, *simplehtmldom* hasn't had a release since 2008. I'd avoid that like the plague. See the comments in this answer for more reasons - http://stackoverflow.com/a/3577662/283366 – Phil Apr 16 '14 at 06:44
  • @phil - where do you see that? The last update was in 2013 – pguardiario Apr 16 '14 at 06:53
  • @pguardiario looking at the tags in their SVN repo. I guess they stopped using tags. Still, the latest release says 2012-09-10. There's also plenty of negative press around this library so I still withhold any recommendations over the standard DOM library – Phil Apr 16 '14 at 07:00
  • @phil - When it works, simple html dom is concise, readable, and provides a much nicer interface than DOM. Unfortunately it only works some of the time. – pguardiario Apr 16 '14 at 09:04

3 Answers3

0

I don't know about simple HTML DOM but the built in PHP DOM library should suffice.

Say you have your date in a string like this...

$date = '1st Apr 2014';

You can easily find the corresponding link using an XPath expression. For example

$doc = new DOMDocument();
$doc->loadHTMLFile('link.html');

$xp = new DOMXpath($doc);
$query = sprintf('//span[starts-with(., "%s")]/following-sibling::span/a', $date);

$links = $xp->query($query);
if ($links->length) {
    $href = $links->item(0)->getAttribute('href');
}
Phil
  • 157,677
  • 23
  • 242
  • 245
  • xpath is great for xml, but in css is much better for html – pguardiario Apr 16 '14 at 06:39
  • @pguardiario I have no idea what you mean. XPath is relevant for any DOM document (which includes HTML). CSS style selectors also have no way of matching on an elements contents – Phil Apr 16 '14 at 06:40
  • @pguardiario Yes, *relevant* as in that it works and does so quickly and accurately. – Phil Apr 16 '14 at 07:05
  • If that were all that mattered, I suppose stylesheets would be written in xpath. – pguardiario Apr 16 '14 at 08:55
0

You were on the right path with your last example...

I modified it a bit to get the following which basically gets all spans, then test if they have the searched text, and if so, it displays the content of their next sibling if there is any (check the in code comments):

$input =  <<<_DATA_
    <table>
        <tbody>
            <tr class="c0">
                <td class="c11">
                    <td class="c8">
                        <ul class="c2 lst-kix_h6z8amo254ry-0 start">
                            <li class="c1">
                                <span>1st Apr 2013 - </span>
                                <span>1st Apr 2014 - </span>
                                <span class="c6">
                                    <a class="c4" href="/link.html">View</a>
                                </span>
                                <span>1st Apr 2015 - </span>
                            </li>
                        </ul>
                    </td>
                </td>
            </tr>
        </tbody>
    </table>
_DATA_;

// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);

// Searched value
$searchDate = '1st Apr 2014';

// Find all the spans direct childs of li, which is a descendent of table
$spans = $html->find('table li > span');

// Loop through all the spans
foreach ($spans as $span) {
    // If the span starts with the searched text && has a following sibling
    if ( strpos($span->plaintext, $searchDate) === 0 && $sibling = $span->next_sibling()) {
        // Then, print it's text content
        echo $sibling->plaintext;    // or ->innertext for raw content
        // And stop (if only one result is needed)
        break;
    }
}

OUTPUT

View

For the string comparison, you may also (for the best) use regex...

So in the code above, you add this to build your pattern:

$pattern = sprintf('~^\s*%s~i', preg_quote($searchDate, '~'));

And then use preg_match to test the match:

if ( preg_match($pattern, $span->plaintext) && $sibling = $span->next_sibling()) {
Enissay
  • 4,969
  • 3
  • 29
  • 56
  • Sometimes it makes sense to use something other than `/` as a preg delimeter. But there needs to be a reason. Otherwise your regex is messy looking for nothing. – pguardiario Apr 16 '14 at 09:56
  • `Any non-alphanumeric, non-backslash, non-whitespace character can be used as a delimiter` and it must be escaped when it appears inside the pattern... Simply for a better lisibility, it would be good to choose a char that doesnt appear in your pattern... And it happens that `/` is the favourite/most used one, that's it! So it's true there's no reason to use `~` in this case, but none either for using `/`... – Enissay Apr 16 '14 at 11:29
  • Using `~` instead of `/` impedes readablilty in general. Unless there's a good reason, of course. – pguardiario Apr 16 '14 at 12:20
0
    include('simple_html_dom.php');

    $html = file_get_html('link.html');
        $compare_text = "1st Apr 2013";


        $tds = $html->find('table',1)->find('span');

        $num = 0;
         foreach($tds as $td){

        if (strpos($td->plaintext, $compare_text) !== false){

                $next_td = $td->next_sibling();
                    foreach($next_td->find('a') as $elm) {
                    $num = $elm->href;
                    }
             //$day_url =   array($day => array(daylink => $day, text => $td->plaintext, link => $num));
echo $td->plaintext. "<br />";
echo $num . "<br />";
             }

         }
Lipsa
  • 397
  • 3
  • 7
  • 26
  • if compare_text$ is an array and content a number of date can I save the echo output element in a new multidimensional array like the comment statement..........Can anyone help me for this????????? – Lipsa Apr 18 '14 at 04:51