0

I have written a scraper that does html scraping and then use API to get some data, since its a very lengthy code I haven't put it here. I have implemented random sleep method and using it within my code to monitor throttle. But I want to make sure I don't over run this code, so my idea is to run for an 3-4 hours then taker breather and then run again. I haven't done anything like this in python I was trying to search but not really sure where to start from, it would be great if I get some guidance on this. If python has a specific module link to that would be a great help.

Also is this relevant? I don't I need this level of complication?

Suggestions for a Cron like scheduler in Python?

I have functions for every single scraping task, and I have main method calling all those functions.

Community
  • 1
  • 1
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232

2 Answers2

1

You could just note the time you have started and each time you want to run something make sure you haven't exceeded the given maximum. Something like this should get you started:

from datetime import datetime
MAX_SECONDS = 3600

# note the time you have started
start = datetime.now()

while True:
    current = datetime.now()
    diff = current-start
    if diff.seconds >= MAX_SECONDS:
        # break the loop after MAX_SECONDS
        break

    # MAX_SECONDS not exceeded, run more tasks
    scrape_some_more() 

Here's the link to the datetime module documentation.

kgr
  • 9,750
  • 2
  • 38
  • 43
1

You can use a threading.Timer object to schedule an interrupt signal to the main thread after the time is exceeded:

import thread, threading

def longjob():
    try:
        # do your job
        while True:
            print '*', 
    except KeyboardInterrupt:
        # do your cleanup
        print 'ok, giving up'

def terminate():
    print 'sorry, pal'
    thread.interrupt_main()

time_limit = 5  # terminate in 5 seconds
threading.Timer(time_limit, terminate).start()
longjob()

Put this in your crontab and run every time_limit + 2 minutes.

georg
  • 211,518
  • 52
  • 313
  • 390
  • thanks for a great example I have never dealt with cron jobs any good tutorial i can follow..? on setting up cron jobs/crontab – add-semi-colons Oct 20 '12 at 17:47
  • 1
    @Null-Hypothesis: here you go: http://www.unixgeeks.org/security/newbie/unix/cron-1.html – georg Oct 20 '12 at 19:02