0

so here is a simple web app I deployed recently: https://covid19-visualisation.herokuapp.com/

I used the Dash framework and deployed it using Heroku.

I use df = pd.read_csv('owid-covid-data.csv') to load the data set. The data set can be found here:

https://ourworldindata.org/covid-vaccinations

The data is being updated every day. Is there a way to automate this process like passing some kind of a link into this read_csv function?

smci
  • 32,567
  • 20
  • 113
  • 146
  • With `requests` you can automate scraping the website and downloading the data. – Mitchell Olislagers Jan 31 '21 at 03:35
  • This is strictly just a question about pandas (and makye Heroku, depending on your directory structure); please don't tag it with every package you used: [tag:plotly], or [tag:hyphen] (because you typed 'dash'?) – smci Jan 31 '21 at 03:43
  • So do you want to stream in the daily dataset, then save it in some named/dated format, and/or directory sutrcture in Heroku? Or just overwrite today's daily with yesterday's? – smci Jan 31 '21 at 03:45
  • You can do [Pandas read_csv directly from url](https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url) – smci Jan 31 '21 at 03:46

2 Answers2

2

You can try requests like this:

import pandas as pd
import io
import requests
url = "https://covid.ourworldindata.org/data/owid-covid-data.csv?v=2021-01-31'"
r = requests.get(url).content
contents_df = pd.read_csv(io.StringIO(r.decode('utf-8')))

You will probably have to change the date at the end of the url everyday to get the most current file.

Jorge
  • 2,181
  • 1
  • 19
  • 30
1

You can do Pandas read_csv() directly from url. Presumably surround that with code to handle fault-tolerance, retries, etc.

No need to even store it as file, unless you want persistence to disk.

smci
  • 32,567
  • 20
  • 113
  • 146