How Download Github Repo Filled with CSV Files on Github using Python?

Question

I'm trying to do some exploratory data analysis on the data that is provided by CSSE at Johns Hopkins University. They have it on Github at this link https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports I'm trying to download the entire file using python that will save it to my current directory. That way I'll have all the up to date, data and can reload it to use. I'm using two functions fetch_covid_daily_data() that will go to the website and download all the CSV files. Then ill have a load_covid_daily_data() that will go in the current repo and read the data so I can process it with pandas.

I'm doing this way because if I go back to my code I can call the function fetch_covid_daily_data() and it will download all the new changes made such as another daily CSV added.

Prayson W. Daniel · Answer 1 · 2020-04-11T07:35:25.007

You can read data directly from online CSV to Pandas DataFrame:

Examples:

import pandas as pd

CONFIRMED_URL = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'

df = pd.read_csv(CONFIRMED_URL)

# df now contains data from time of call.

You can also create a class to get and manipulate all data


import pandas as pd

class Corona:


    def __init__(self):

        BASE_URL = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series'

        self.URLS = {'confirmed': f'{BASE_URL}/time_series_covid19_confirmed_global.csv',
                'deaths': f'{BASE_URL}/time_series_covid19_deaths_global.csv',
                'recovered':f'{BASE_URL}/time_series_covid19_recovered_global.csv', 
        }


        self.data = {case:pd.read_csv(url) for case, url in self.URLS.items()}

    # create other useful functions to work with data
    def current_status(self):
        # function to show current status
        pass

To get current data:

# returns data as dictionary with DataFrames as Values
corona = Corona()
confirmed_df = corona.data['confirmed']

# If you want to save them to csv
confirmed_df.to_csv('confirmed.csv', index=False)

# show first five rows
print(corona_df.head())

# check other DataFrame
print(corona.data.keys())

Hi yes, I have done your first example multiple times but my problem is that I want to collect all those csv in the daily reports and join them together myself. I want to know if theres an easy way to do this in case I come across data that's in multiple csv files and ill need to join them. Im trying to do this on google colab so I don't want to download the data — Knowlege_Collector, May 03 '20 at 05:11
You can easily do that too. What I love about classes is that they help organise your code. To answer your multiple csv, if there is a pattern in csv names, you can still use the class above with list comprehension to get all csvs and the merge/concat/join then to one. I am happy to help if you provide a sample url if csvs and what you will like to do. See https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe — Prayson W. Daniel, May 03 '20 at 05:47

score 0 · Answer 2 · answered Apr 11 '20 at 07:01

0

Assuming you have git installed, you need to clone the repository from your terminal

git clone https://github.com/CSSEGISandData/COVID-19

hope this helps!

answered Apr 11 '20 at 07:01

Teejay Bruno

1,716
1
4
11

How Download Github Repo Filled with CSV Files on Github using Python?

2 Answers2