1

so i found a nice slice of code, well its best to say a theoretical example of what might be the code for a decent web parse. I recall that when i got into this issue myself a while back i could not get the script to run for an infinite time span in fact i couldn't have it run more then a few hours.

This is after i set the following:

    set_time_limit(0);
    ini_set("memory_limit", "800M");
// in some case i would set them both in others, just one or the other

I have been doing some reading as to how to get PHP to run for a longer time span, an infinity stretch to be exact. I found many suggestions for cron job instead of PHP yet i would like to find a way to have this done in php.

I would love some examples, even theoretical if thats all you can muster.

I would like to use this block of code i mentioned above, which i found here as a reference to get the discussion started on the right path.

How to write a crawler?

  while(list of unvisited URLs is not empty) {
     take URL from list
     fetch content
     record whatever it is you want to about the content
     if content is HTML {
     parse out URLs from links
     foreach URL {
        if it matches your rules
          and it's not already in either the visited or unvisited list
          add it to the unvisited list
     }
   }
 }
Community
  • 1
  • 1
RmH
  • 37
  • 5
  • `I found many suggestions for cron job instead of PHP yet i would like to find a way to have this done in php.` - these are not mutually exclusive. You can easily set up a cron job to run a PHP script. For something like this, it would be what you want to do - you don't want to site with a web browser open on a page waiting for it to complete the task. When you run PHP from the command line (which you are doing if it is set up as a cron job) you don't need to worry about the time limit, at the command line there is no limit by default. – DaveRandom Feb 02 '12 at 10:57
  • @DaveRandom A for insistence would be lovely. – RmH Feb 02 '12 at 11:00
  • What's the problem? We're running php script for weeks in work and it just goes fine. – Vyktor Feb 02 '12 at 13:00
  • So what exactly is your question? – Gumbo Feb 02 '12 at 13:05
  • @Vyktor say this parser was running with two arrays set to function as memory, do you think they would run all the same ? – RmH Feb 02 '12 at 17:31
  • @RmH it should work and run just fine are you getting any errors? – Vyktor Feb 02 '12 at 17:34
  • @Vyktor none, just a crash out of the blue, a side question, do you know How to Enable the PHP COM class on WAMP ? – RmH Feb 02 '12 at 18:07
  • @RmH I've nerver used php under the windows – Vyktor Feb 02 '12 at 18:09

1 Answers1

1

Use cronjobs WITH php not instead of.

You can run PHP scripts as a cron on a Linux server as follows:

<time/frequency> <path to PHP> <php script full path>

e.g. This will run every minute...

* * * * * /usr/bin/php -q /var/www/html/cron/parser.php

Or on Windows you can use the Task Scheduler to run the script, which is located in Control Panel, on Windows 7 it is in Administrative Tools within Control Panel.

I have used a combination of PHP, MySQL, Curl and crons to have a web application run indefinitely until it's parsed all the data I want to strip from URLs.

Leo Haris
  • 561
  • 2
  • 8