0

I'm creating a function that passes a URL over and gets the content of the page. If this page contains "Next>", I would like to grab the url of that and continue onto the page next under the page doesn't contain next anymore.

How would this be done? a while loop?

check_url("http://site.com");
-> url contains 'next', href is http://site.com/ggkdoe

-> does http://site.com/ggkdoe contain next? if so, hit it again and check if that contains 'next' then get that url etc etc

Understand? How can this be done?

Thank you in advance

tony
  • 1
  • normally, "Next" button is generated by server-side, not parsing client-side output. – Raptor Feb 28 '12 at 09:00
  • possible duplicate of [Robust, Mature HTML Parser for PHP](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) – CodeCaster Feb 28 '12 at 09:01
  • I can get the URLs fine, I just need to keep continuing on until the term doesn't exist on the page anymore. I'm using Simple HTML Dom – tony Feb 28 '12 at 09:01

2 Answers2

0

Most likely something like this:

<?php
$checkNext = false;
$currentURL = "http://site.com";
do {
    $check = check_url($currentURL);
    if ($check !== null) {
       $currentURL = $check;
       $checkNext = true;
    } else {
       $checkNext = false;
    }
} while ($checkNext);

And I assume that check_url() will return an URL if one could be found and null otherwise. The do-while-loop ensures that the check is done at least once for the initial URL and afterwards again as long as check_url() could find another URL. At the end use $currentURL for whatever you want to do with that.

Till Helge
  • 9,253
  • 2
  • 40
  • 56
0

You could use recursivity for complete link search:

function checkUrl($url) {
    $atLeastOneUrl = true;
    // Check your content
    // Log some data about current Url
    foreach ($urlFound in $urlsFound){
        check_url($urlFound);
        $atLeastOneUrl=true;
    }

return $atLeastOneUrl;
}

But you will want to check that link 1 --> link 2 --> ... --> link1 cycle won't interfer with your search ;)

Michael Laffargue
  • 10,116
  • 6
  • 42
  • 76