I am having a problem. This is what I have to do and the code is taking extremely long to run:
There is 1 website I need to collect data from, and to do so I need my algorithm to visit over 15,000 subsections of this website (i.e. www.website.com/item.php?rid=$_id
), where $_id
will be the current iteration of a for
loop.
Here are the problems:
- The method I am currently using to get the source code of each page is
file_get_contents
, and, as you can imagine, it takes super long tofile_get_contents
of 15,000+ pages. - Each page contains over 900 lines of code, but all I need to extract is about 5 lines worth, so it seems as though the algorithm is wasting a lot of time by retrieving all 900 lines of it.
- Some of the pages do not exist (i.e. maybe www.website.com/item.php?rid=
2
exists but www.website.com/item.php?rid=3
does not), so I need a method of quickly skipping over these pages before the algorithm tries to fetch its contents and waste a bunch of time.
In short, I need a method of extracting a small portion of the page from 15,000 webpages in as quick and efficient a manner as possible.
Here is my current code.
for ($_id = 0; $_id < 15392; $_id++){
//****************************************************** Locating page
$_location = "http://www.website.com/item.php?rid=".$_id;
$_headers = @get_headers($_location);
if(strpos($_headers[0],"200") === FALSE){
continue;
} // end if
$_source = file_get_contents($_location);
//****************************************************** Extracting price
$_needle_initial = "<td align=\"center\" colspan=\"4\" style=\"font-weight: bold\">Current Price:";
$_needle_terminal = "</td>";
$_position_initial = (stripos($_source,$_needle_initial))+strlen($_needle_initial);
$_position_terminal = stripos($_source,$_needle_terminal);
$_length = $_position_terminal-$_position_initial;
$_current_price = strip_tags(trim(substr($_source,$_position_initial,$_length)));
} // end for
Any help at all is greatly appreciated since I really need a solution to this!
Thank you in advance for your help!