0

Currently, this PHP script scrapes all content in between the beginning and closing tags. I use this to scrape titles of poems. For this to work, the user must input manually the URL to scrape, beginning tag, and closing tag.

<?php
$url=$_POST["url"];
$beg=$_POST["beg"];
$end=$_POST["end"];
$tryscrape=$_POST["tryscrape"];
$end=str_replace('/','\/', $end);
$beg=str_replace('/','\/', $beg);
$end=str_replace('\"','"', $end);
$beg=str_replace('\"','"', $beg);

echo '<form action="' . $PHP_SELF . '" method="post">
Beginning: <input name="beg" value="" style="width: 100px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;End: <input name="end" value="" style="width: 100px;"><BR>
URL: <input name="url" value="' . $url . '" style="width: 225px;">
<input type="hidden" name="tryscrape" value="1">
<input name="submit" type="submit" value="Scrape >>" class="button" />
</form>';


echo 'Scrape Results for <strong>' . $url . '</strong><br><br>';

if($tryscrape==1)
{
    $data = file_get_contents($url);
    $regex = '/'.$beg.'(.+?)'.$end.'/';
    $count=1;
    preg_match_all($regex,$data,$match,PREG_SET_ORDER);
    foreach ($match as $result) {
        $link = $result[1];
        $link=strip_tags($link);
        echo $link . '<br>';
    }
}

?>

Now I am stuck, I want to change the input fields and change the "URL to scrape" field to "keyword", while the "URL to scrape" will be fixed to "(http://www.poemhunter.com/search/?w=title&q=" . $keyword . "&p=" . $randomnumberfrom1to30)"

Another thing is to limit the displayed titles to only 5. Usually there are 25 scraped titles, but it should be limited to only 5 by randomly selecting from these 25 scraped titles.

I would really appreciate if somebody would help! Thank you!

emotheraphy
  • 97
  • 11

0 Answers0