I'm trying to do two things here:
- Import all the .csv files and add them up to a df.
- Update the df with the latest file uploaded.
I have been able to import one .csv with:
import pandas as pd
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/01-22-2020.csv'
pd.read_csv(url).fillna(0)
I could import all the .csv
files one per one (or with a loop if I knew how to extract all the .csv
filenames), but there should be a more efficient way. Once I have the df, to "update" it I would:
- Extract all the
.csv
filenames. - Check if all of them are in the df (with the date column). If one is missing, add the missing .csv file to the df.
The problems I'm having are: (a) how can I make scalable the way to extract all the .csv files? and (b) is there any way to extract ONLY the filenames that end with .csv
from the github folder? In order to do (2) of above.