0

My Goal is to read a .csv file from google drive and load it to a dataframe.

I tried some answers here but the thing is, the file is not public and needs authentication.

I looked up on goggle drive API but I was stuck there and I don't know how to move forward. I did manage to open google sheet and load it to a dataframe but that is different, this is a sample for google sheet that works.

service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
sheets_file = sheet.values().get(
                     spreadsheetId=sheet_id,
                     range=sheet_range
                     ).execute()
    
header = sheets_file.get('values', [])[0]   # Assumes first line is header!
values = sheets_file.get('values', [])[1:]  # Everything else is data.
  
if not values:
    print('No data found.')
else:
    all_data = []
    for col_id, col_name in enumerate(header):
        column_data = []
        for row in values:
            column_data.append(row[col_id])
        ds = pd.Series(data=column_data, name=col_name)
        all_data.append(ds)
        df = pd.concat(all_data, axis=1)
        print(df.head())

I saw some google colab methods too but I cant use that as I am restricted to using python only, any Idea on how to approach this?

Led
  • 662
  • 1
  • 19
  • 41
  • 1
    If it requires authentication to get to the file, then you may have to use `Selenium` to open the webpage, login, and download. – David Erickson Sep 17 '20 at 23:36
  • I prefer to use google API as I have to change the flow if I use selenium – Led Sep 17 '20 at 23:53
  • Can I ask you about your question? 1. In your script, can you use `service` of `service = build('sheets', 'v4', credentials=creds)` for `sheet.values().get(###)`? I would like to ask whether you have already done the authorization of the scopes. 2. About `doing other format like csv/.xlsx`, which do you want to convert from Google Spreadsheet to? – Tanaike Sep 18 '20 at 00:08
  • yes I already have completed authentication, I did not include the boilerplate code. And no I dont want to convert it to google sheet. I only want to get the csv file and convert it to a dataframe – Led Sep 18 '20 at 00:22
  • Thank you for replying. From your replying, I proposed 3 sample scripts as an answer. Could you please confirm it? If those were not the direction you expect, I apologize. – Tanaike Sep 18 '20 at 01:25

1 Answers1

3

I believe your goal and situation as follows.

  • You want to download the CSV data from the CSV file on Google Drive.
  • You can get values from Google Spreadsheet using googleapis for python.

Pattern 1:

In this pattern, the CSV data is downloaded with googleapis. The downloaded CSV data is saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.

Sample script:

file_id = "###"  # Please set the file ID of the CSV file.

service = build('drive', 'v3', credentials=creds)
request = service.files().get_media(fileId=file_id)
fh = io.FileIO("sample.csv", mode='wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
  • In this case, the CSV data can be converted to the dataframe with df = pd.read_csv("sample.csv").

Pattern 2:

In this pattern, as a simple method, the access token is used from creds. The downloaded CSV data is not saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.

Sample script:

file_id = "###"  # Please set the file ID of the CSV file.

access_token = creds.token
url = "https://www.googleapis.com/drive/v3/files/" + file_id + "?alt=media"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res.text)
  • In this case, the CSV data can be directly converted to the dataframe with df = pd.read_csv(io.StringIO(res.text)).

Note:

  • In the following scripts, please include the scope of https://www.googleapis.com/auth/drive.readonly and/or https://www.googleapis.com/auth/drive. When you modified the scopes, please reauthorize the scopes. By this, the modified scopes are included in the access token. Please be careful this.

Reference:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • uhmm no, i actually want to open csv file on the drive. Sorry bout that, Will update the question for clarity – Led Sep 18 '20 at 01:33
  • 1
    @Led Thank you for replying. In your situation, a CSV file is on Google Drive, and you want to retrieve the CSV data from the CSV file. Is my understanding correct? – Tanaike Sep 18 '20 at 01:35
  • yep thats the goal. From google drive, Load the csv contents into a dataframe. – Led Sep 18 '20 at 01:36
  • 1
    @Led Thank you for replying. I could correctly understand about your goal. I apologize for my poor English skill. From your script, I had thought that you wanted to retrieve the CSV data from Google Spreadsheet. From your replying, I would like to update my answer. Could you please wait for it? – Tanaike Sep 18 '20 at 01:38
  • sure thing, i don't have any choice :D – Led Sep 18 '20 at 01:45
  • @Led Thank you for replying. I updated my answer. Could you please confirm it? – Tanaike Sep 18 '20 at 01:53
  • working on it, I got google issues. Let me get back at you – Led Sep 18 '20 at 02:24
  • thank you so much, it works! domo arigatou gozaimasu – Led Sep 18 '20 at 03:08
  • @Led Thank you for replying and testing it. I'm glad your issue was resolved. Thank you, too. – Tanaike Sep 19 '20 at 00:01