0

I have the following code designed to pull json data from a website and record it to a csv file:

def rec_price():
    with urllib.request.urlopen('some_url') as url:
        data = json.loads(url.read().decode())
    df = pd.DataFrame(data)

    df1 = df[['bpi','time']]

    x = df1.loc['USD', 'bpi']['rate']
    y = df1.loc['updated', 'time']

    df2 = pd.DataFrame({'data': [x], 'time' : [y]}) 

    df2['time'] = pd.to_datetime(df2['time'])

    with open('out.csv', 'a') as f:
        df2.to_csv(f, header=False)

I would like to run this code every 60 seconds, indefinitely. It seems like the two options available are to install apscheduler or to use pythons standard import sched, time module... I would like to know, what are the differences between the two modules? Is one better suited to the task? How would I implement the module?

zsad512
  • 861
  • 3
  • 15
  • 41
  • What about `while True`? with a `sleep 60` between function calls – Chen A. Aug 27 '17 at 21:54
  • windows or *nix? On *nix systems a better solution will be `cron`. – sKwa Aug 27 '17 at 21:57
  • @sKwa- this will be run on a MacBook Pro- if that answers your question – zsad512 Aug 27 '17 at 21:58
  • @Vinny based upon another response here https://stackoverflow.com/questions/474528/what-is-the-best-way-to-repeatedly-execute-a-function-every-x-seconds-in-python I figured your method would not be effective – zsad512 Aug 27 '17 at 21:59
  • cron still works for Mac – OneCricketeer Aug 27 '17 at 22:46
  • Having worked on a similar project which required a script to run every x minutes to access a website and save some information to a csv file, I can recommend using `cron` to initiate each instance of the script, rather than trying to have a script run indefinitely. It's much easier and will save you a lot of pain, troubleshooting, and unneeded error handling. – cddt Aug 27 '17 at 22:47
  • @cddt I am new to programming- could you explain further or give an example? – zsad512 Aug 27 '17 at 22:49
  • I will let you search for how to `cron` works, and how to use it. But in principle it allows you to run a command (e.g. execute a script) on a time schedule that you specify (e.g. every minute, every day at 1:23 a.m., etc.) so rather than having your script running continuously, you run it again every time you want to check the website and update your csv file. Does that help? – cddt Aug 28 '17 at 00:22
  • The trouble with cron or scheduled tasks on windows is setting them up automaticallyupon your program installation. Recommended method is very good, but do not worry, if you use example from my answer it wouldn't inpact OS's performance. OS will almost completely ignore your script while it is sleep()-ing. – Dalen Aug 28 '17 at 10:25
  • @Dalen what answer? – zsad512 Aug 28 '17 at 13:25
  • The one I wrote here 15 hours ago! – Dalen Aug 28 '17 at 13:37

1 Answers1

2
from threading import Timer

t = None # It is advisable to have a Timer() saved globally

def refresh ():
    global t
    # Get your CSV and save it here, then:
    t = Timer(60, refresh)
    t.daemon = True
    t.start()

refresh()

Or:

from thread import start_new_thread as thread
from time import sleep
from urllib2 import URLError, HTTPError, urlopen
import urllib2

def refresh ():
    while 1:
        try:
            # Get and save your CSV here, then:
            sleep(60)
        except (URLError, HTTPError):
            pass
        except urllib2.socket.timeout:
            pass
        except:
            break

thread(refresh,())
# Or just refresh() if you want your script to do just this and nothing else

To complete my answer: sched module does very similar thing as code above, but it allows you to add "indefinite" number of functions to be called at any time and you can, also, specify priorities of their executions to attempt real-time execution. In short, it emulates part of cron. But, for what you need, this would be an overkill. You would have to setup an event to be launched after fixed amount of time, then re-add it back after its execution and so on. You use sched when you have more than one function to be fired in different time intervals or with different arguments etc. To be honest, I would personally never use sched module. It is too rough. Instead I would adapt codes I presented above to emulate sched's capabilities.

Dalen
  • 4,128
  • 1
  • 17
  • 35
  • How would you ammend your first suggestion to apply to multiple functions? – zsad512 Aug 28 '17 at 13:51
  • Depends how you need them to be executed. If it is same function with changing arguments I would use deque() to put them in, then refresh would do a while pop()-ing from queue, until deque() is empty, then start a Timer() thread to try again after the interval if there is anything in the queue. That's something like sched does. If there are more actions you would like to perform, just put each one in its function and call them inside refresh() one after another or in threads but wait for all of them to finish before setting a Timer(). – Dalen Aug 28 '17 at 14:02