I want to write a scraper with python that crawl some urls and scrape and save datas. I know how can I write it as a simple program. I'm looking for a way to deploy it on my virtual server (running ubuntu) as a service to make it non-stop crawling. Could any one tell me How can I do this?
Asked
Active
Viewed 1,583 times
6

DisappointedByUnaccountableMod
- 6,656
- 4
- 18
- 22

Bardia Heydari
- 777
- 9
- 24
-
So which part do you need help on? Letting it run for a long time? Writing a crawler? Using python with Linux? – Sleep Deprived Bulbasaur Aug 14 '14 at 13:24
-
Letting it run for a long time :) @SleepDeprivedBulbasaur – Bardia Heydari Aug 14 '14 at 13:25
-
Use scrapy to build your own crawler. Don't ever let the queue of URLs that it's scraping go empty. http://doc.scrapy.org/en/latest/topics/spiders.html – FrobberOfBits Aug 14 '14 at 13:34
-
How about `while True: scrape()`? That will run for a long time. – Kevin Aug 14 '14 at 13:34
-
1@BardiaHeydari you want to look into daemonizing it. – Sleep Deprived Bulbasaur Aug 14 '14 at 13:38
1 Answers
5
What you want to do is daemonize the process. This will be helpful in creating a daemon.
Daemonizing the process will allow it to run in background mode, so as long as the server is running (even if a user isn't logged in) it will continue running.
Here is an example daemon that writes the time to a file.
import daemon
import time
def do_something():
while True:
with open("/tmp/current_time.txt", "w") as f:
f.write("The time is now " + time.ctime())
time.sleep(5)
def run():
with daemon.DaemonContext():
do_something()
if __name__ == "__main__":
run()

Community
- 1
- 1

Sleep Deprived Bulbasaur
- 2,368
- 4
- 21
- 33