Automated Dataset import?

Question

so here is a simple web app I deployed recently: https://covid19-visualisation.herokuapp.com/

I used the Dash framework and deployed it using Heroku.

I use df = pd.read_csv('owid-covid-data.csv') to load the data set. The data set can be found here:

https://ourworldindata.org/covid-vaccinations

The data is being updated every day. Is there a way to automate this process like passing some kind of a link into this read_csv function?

With `requests` you can automate scraping the website and downloading the data. — Mitchell Olislagers, Jan 31 '21 at 03:35
This is strictly just a question about pandas (and makye Heroku, depending on your directory structure); please don't tag it with every package you used: [tag:plotly], or [tag:hyphen] (because you typed 'dash'?) — smci, Jan 31 '21 at 03:43
So do you want to stream in the daily dataset, then save it in some named/dated format, and/or directory sutrcture in Heroku? Or just overwrite today's daily with yesterday's? — smci, Jan 31 '21 at 03:45
You can do [Pandas read_csv directly from url](https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url) — smci, Jan 31 '21 at 03:46

score 2 · Answer 1 · answered Jan 31 '21 at 03:36

You can try requests like this:

import pandas as pd
import io
import requests
url = "https://covid.ourworldindata.org/data/owid-covid-data.csv?v=2021-01-31'"
r = requests.get(url).content
contents_df = pd.read_csv(io.StringIO(r.decode('utf-8')))

You will probably have to change the date at the end of the url everyday to get the most current file.

smci · Answer 2 · 2021-01-31T03:56:28.513

1

You can do Pandas read_csv() directly from url. Presumably surround that with code to handle fault-tolerance, retries, etc.

No need to even store it as file, unless you want persistence to disk.

edited Jan 31 '21 at 03:56

answered Jan 31 '21 at 03:49

smci

32,567
20
113
146

Automated Dataset import?

2 Answers2