How can I write a non-stop crawler with python and run it on a server?

Question

I want to write a scraper with python that crawl some urls and scrape and save datas. I know how can I write it as a simple program. I'm looking for a way to deploy it on my virtual server (running ubuntu) as a service to make it non-stop crawling. Could any one tell me How can I do this?

So which part do you need help on? Letting it run for a long time? Writing a crawler? Using python with Linux? — Sleep Deprived Bulbasaur, Aug 14 '14 at 13:24
Use scrapy to build your own crawler. Don't ever let the queue of URLs that it's scraping go empty. http://doc.scrapy.org/en/latest/topics/spiders.html — FrobberOfBits, Aug 14 '14 at 13:34
How about `while True: scrape()`? That will run for a long time. — Kevin, Aug 14 '14 at 13:34

score 5 · Accepted Answer · edited May 23 '17 at 12:07

What you want to do is daemonize the process. This will be helpful in creating a daemon.

Daemonizing the process will allow it to run in background mode, so as long as the server is running (even if a user isn't logged in) it will continue running.

Here is an example daemon that writes the time to a file.

import daemon
import time

def do_something():
    while True:
        with open("/tmp/current_time.txt", "w") as f:
            f.write("The time is now " + time.ctime())
        time.sleep(5)

def run():
    with daemon.DaemonContext():
        do_something()

if __name__ == "__main__":
    run()

How can I write a non-stop crawler with python and run it on a server?

1 Answers1