0

I have written a piece of python code that scrapes the odds of horse races from a bookmaker's site. I wish to now:

  1. Run the code at prescribed increasingly frequent times as the race draws closer.

  2. Store the scraped data in a database fit for extraction and statistical analysis in R.

Apologies if the question is poorly phrased/explained - I'm entirely self taught and so have no formal training in computer science. I don't know how to tell a piece of python code to run itself every say n-minutes and I also have no idea how to correctly approach building such a data base or what factors I should be considering. Can someone point me in the right direction for getting started on the above?

  • There are two questions. The first is how to schedule a job and the second relates to the database. The second question is really too broad. For the first, the answer depends on your operating system. We use Windows Task Scheduler to schedule python jobs. I think most people who use Linux flavors use something called Cron. You might start with Google and when you get hit with specific issues come back. – PyNEwbie Apr 28 '17 at 01:25
  • Python has a built-in scheduler. [Find out more](http://stackoverflow.com/q/373335/146325) – APC Apr 28 '17 at 14:27

1 Answers1

0

In windows, you can use Task Scheduler or in Linux crontab. You can configure these to run python with your script at set intervals of time. This way you don't have a python script continuously running preventing some hangup in a single call from impacting all subsequent attempts to scrape or store in database.

To store the data there are many options which could either be a flat file(write to a text file), save as a python binary(use shelf or pickle), or install an actual database like MySQL or PostgreSQL and many more. Google for info. Additionally an ORM like SQLAlchemy may make controlling, querying, and updating your database a little easier since it will handle all tables as objects and create the SQL itself so you don't need to code all queries as strings.

recnac
  • 3,744
  • 6
  • 24
  • 46