0

I am doing some work on the Covid-19 and I had to access .csv files on Github. (to be honest, the URL is https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series).

So, I went to this page and downloaded the .csv files that interested me directly on my hard drive: C: \ Users \ ... .csv Then, what I do is that I import these files as pandas dataframes into a Jupyter notebook to work with Python, by coding for example: dataD = pd.read_csv ('C: / Users / path_of_my_file_on_my_computer ...').

It all works very well.

To make it easier to chat with other people, I was told that I should import the .csv files not on my C but on Google drive (https://drive.google.com/drive/my-drive), and then put there also the .ipynb files that I created in Jupyter notebook and then allow access to the people concerned.

So I created a folder on my drive (say, Covid-19) to put these .csv files there, but I don't understand what kind of Python code I am supposed to write at the beginning of my Python file to replace the simple previous instruction dataD = pd .read_csv ('C: / Users / path_of_my_file_on_my_computer ...'), so that the program reads the data directly from my Google drive and no longer from my C?

I have looked at various posts that seem to speak more or less about this issue, but I don't really understand what to do.

I hope my question is clear enough (I am attaching a picture of the situation in my Google drive, assuming that it provides interesting information ... It's in French) enter image description here

Andrew
  • 926
  • 2
  • 17
  • 24

1 Answers1

0

Given that your files are already hosted in the cloud and you are planning a collaborative scenario I think the idea proposed by @Eric is actually smarter.

Approach 1:

Otherwise, if you can't rely on that data source, you will have to build an authorization flow for your script to access Google Drive resources. You can see here a complete documentation on how to build your Python script and interact with the Google Drive API.

Approach 2:

Although, the Google Drive API requires authorization to access files URLs, you can build a workaround. Google Drive will generate some export links that, if your file is publicly available, will be accessible without authorization. In this Stack Overflow answer you can find more details about it.

In your Python script you will be able to parse the URL request directly without accessing the file system nor google drive authorization flow.

Alessandro
  • 2,848
  • 1
  • 8
  • 16
  • I am missing something... Following the idea proposed by @Eric, I used the code `data = pd.read_csv('https://github.com/CSSEGISandData/... ... ... /time_series_covid19_deaths_global.csv', sep=',', header=0)`, which contains the entire URL of the `csv` file, but I got a `ParserError` that I don't understand... – Andrew Apr 20 '20 at 16:12
  • Not for the moment @Alessandro. I am really a beginner and the information given is unfortunately too technical for me; I really find it hard to understand what is said and what I have to do concretely; for example, although I have a Google Drive account, I don't know what Google Drive API is ... I guess once we learn it, it's very simple, but, as I said it, I really begin. For now, I will just save .csv files on my hard drive, and I will learn more gradually. Thanks anyway for the explanations; they will surely be useful to me. – Andrew Apr 21 '20 at 13:14
  • Hello @Andrew, I saw that you are accessing the github page instead of the raw .csv file. Hence the parsing error using the parse_csv function with an html file. Try to access the github resource using the https://raw.githubusercontent.com/ domain. For example in your case: `https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv`. That will cause problems with whatever approach so let me know if that works for you. – Alessandro Apr 21 '20 at 15:37
  • Hi @Alessandro, I tested your method and it works perfectly! I get a splendid dataframe. I was totally unaware of this `raw.githubusercontent.com` domain.I am very grateful to you that you took time to solve this problem which still bothered me a lot. Thanks a lot! – Andrew Apr 22 '20 at 19:25