Need to run a program which scrapes data once a day autonomously

Question

I created a program which scrapes an online table and writes an Excel file of it. In order for this data to be useful I need to scrape it once a day at exactly the same time every day (financial data).

I am planning on packaging the code into a Windows executable file and have it run autonomously on my VPS. Below is the code. How would I go about looping the program to run once a day at exactly 9 am and creating an Excel file with the current date? Also how would I go about packaging the program into an .exe file? Any help is much appreciated.

'''
   This program will scrape the Dukascopy sentiment iframe and create a table of all currency pairs from the historical sentiment index.
   it also writes the data to an excel file.
'''
import requests
import re
import json
import pandas as pd

params = {
    'path': 'historical_sentiment_index/data',
    'type': 'swfx',
    'jsonp': '_callbacks____2k8ym6sn2'
}

headers = {
    'Referer': 'https://freeserv.dukascopy.com/2.0/?path=historical_sentiment_index/index&showHeader=false&tableBorderColor=%234bafe9&liquidity=consumers&availableInstruments=EUR/JPY%2CUSD/RUB%2CUSD/HKD%2CAUD/CAD%2CAUD/CHF%2CAUD/JPY%2CAUD/NZD%2CAUD/SGD%2CCAD/CHF%2CCAD/HKD%2CCAD/JPY%2CCHF/JPY%2CCHF/PLN%2CCHF/SGD%2CEUR/AUD%2CEUR/CAD%2CEUR/CHF%2CEUR/DKK%2CEUR/GBP%2CEUR/HKD%2CEUR/NOK%2CEUR/NZD%2CEUR/PLN%2CEUR/RUB%2CEUR/SEK%2CEUR/SGD%2CEUR/TRY%2CGBP/AUD%2CGBP/CAD%2CGBP/CHF%2CGBP/JPY%2CGBP/NZD%2CHKD/JPY%2CNZD/CAD%2CNZD/CHF%2CNZD/JPY%2CSGD/JPY%2CTRY/JPY%2CUSD/CNH%2CUSD/DKK%2CUSD/MXN%2CUSD/NOK%2CUSD/PLN%2CUSD/SEK%2CUSD/SGD%2CUSD/TRY%2CUSD/ZAR%2CZAR/JPY&availableCurrencies=AUD%2CCAD%2CCHF%2CGBP%2CHKD%2CJPY%2CMXN%2CNOK%2CNZD%2CPLN%2CRUB%2CSEK%2CSGD%2CTRY%2CUSD%2CZAR%2CEUR%2CXAG%2CXAU&sort=volume&order=asc&last=true&sixhours=true&oneday=true&fivedays=true&width=100%25&height=1385&adv=popup&lang=en',
}


def main(url):
    with requests.Session() as req:
        r = req.get(
            "https://freeserv.dukascopy.com/2.0/index.php", params=params, headers=headers)
        match = re.search(r"2k8ym6sn2\((\[.*?])", r.text).group(1)
        data = json.loads(match)
        global df
        df = pd.DataFrame(data).set_index("name")
        #print(df)  # For Full DataFrame view.
        #df.to_csv("data.csv")  # to save it in CSV file
        #print(df.loc['AUD/CAD']) # you can have it as list or dict with to_list() or to_dict()

main("https://freeserv.dukascopy.com/2.0/index.php")


#Creating a dataframe of the needed currency pairs to analyze
df2 = df.loc[['EUR/USD',
'USD/CAD','GBP/USD','USD/CHF','USD/JPY','GBP/USD','CAD/CHF','AUD/CAD','AUD/CHF','AUD/JPY','AUD/NZD','AUD/SGD','CAD/CHF','CAD/HKD','CAD/JPY',
'CHF/JPY','CHF/SGD','EUR/AUD','EUR/CAD','EUR/CHF','EUR/HKD','EUR/JPY','EUR/NZD','EUR/SEK','EUR/SGD','EUR/TRY','GBP/AUD','GBP/CAD','GBP/CHF',
'GBP/JPY','GBP/NZD','HKD/JPY','NZD/CAD','NZD/CHF','NZD/JPY','SGD/JPY','TRY/JPY','USD/CNH','USD/HKD','USD/RUB','USD/SEK','USD/SGD','USD/TRY',
'USD/ZAR','XAU/USD','XAG/USD']]

#Dropping the liquidity provider sentiment
df2 = df2.drop(['last_long','sixhours_long','oneday_long','fivedays_long'],axis=1)
#Renaming the columns for clarity
df2.columns = ['Last update','6hours','1day','5days']
#Sort the table into alphabetical order
df2.sort_values(by='name',axis='index',ascending=True,inplace=True)

#Writing the data into an excel file on the desktop
output_path = r'C:/Users/II/Desktop/Sentiment Data/April,18,2020 - Sentiment Data.xlsx'
from pandas import ExcelWriter
from pandas import ExcelFile

writer = ExcelWriter(output_path)
df2.to_excel(writer,'Sheet1',index=True,)
writer.save()

Simple solution: https://stackoverflow.com/questions/7195503/setting-up-a-cron-job-in-windows -> try to schedule task as python job — shyam, Apr 18 '20 at 18:19
Well i also need the file name to be the exact date, I dont want to run the program every day and it writes the every file with the same name, then Ill be confused — acidtoy, Apr 18 '20 at 18:59
Construct the filename in your script using the current timestamp. Details in the documentation for datetime in the standard library. — Eric Truett, Apr 18 '20 at 19:19

score 0 · Answer 1 · answered Apr 19 '20 at 05:48

0

simply add the following code to your script

import datetime
files_save = str(datetime.date.today().strftime("%d %B %Y"))+'.xlsx'

and schedule your using Task Scheduler

ake sure you logged on as an administrator or you have the same access as an administrator.

Start->Control Panel->System and Security->Administrative Tools->Task Scheduler Action->Create Basic Task->Type a name and Click Next

Follow through the wizard.

answered Apr 19 '20 at 05:48

Manish Chaudhary

498
6
14

Do I need to have an IDE on my server because I dont, also do I need to convert this to an exe file before running it? Im not administrator on the server, thats why I want to use the schedule libarary to do this. – acidtoy Apr 19 '20 at 17:25
just add the your python script to over schedule task and grant all permissions once and make sure you write your command to run your script correctly. – Manish Chaudhary Apr 20 '20 at 05:36

Need to run a program which scrapes data once a day autonomously

1 Answers1