4

I'm trying to read a cvs file with pandas from google drive. Pandas gets it right when reading it from my computer, but when i try to read it from the url i got from google drive to share the file, it seems like it's reading something else, or google drive is doing something weird with the file... heres what i did:

alread_url = 'https://drive.google.com/file/d/1am7jNHA6Lewzws_K'
pd.read_csv(alread_url, squeeze=True, error_bad_lines=False)

b'Skipping line 6: expected 1 fields, saw 2\nSkipping line 7: expected 1 fields
\nSkipping line 25: expected 1 fields, saw 2\nSkipping line 42: expected 1 fields, saw 
2\nSkipping line 43: expected 1 fields, saw 2... some more similar errors

And i get this as dataFrame

<!DOCTYPE html>
0   <html lang="en">
1   <head>
2   <meta charset="utf-8">
3   <meta name="google-site-verification" conten...
4   <title>Meet Google Drive – One place for all...
... ...
1647    <script type="text/javascript" nonce="3QmHtC...
1648    'https:\x2F\x2Faccounts.google.com\x2FPassiv...
1649    </script>
1650    </body>

1651    </html>
1652 rows × 1 columns

Another thing to say is that i ran this in a google colab notebook.... The main goal is to read csv's and xlsx files from google drive without downloading the files anywhere, so, if you know how to do that i don't really care if i can't solve this.

Edit: Here's the raw text pandas is trying to read as csv:

\n<!DOCTYPE html>\n<html lang="en">\n <head>\n <meta charset="utf-8">\n <meta content="width=300, initial-scale=1" name="viewport">\n <meta name="description" content="Google Drive is a free way to keep your files backed up and easy to reach from any phone, tablet, or computer. Start with 15GB of Google storage – free.">\n <meta name="google-site-verification" content="LrdTUW9psUAMbh4Ia074-BPEVmcpBxF6Gwf0MSgQXZs">\n <title>Meet Google Drive – One place for all your files</title>\n <style>\n @font-face {\n font-family: \'Open Sans\';\n font-style: normal;\n font-weight: 300;\n src: url(//fonts.gstatic.com/s/opensans/v15/mem5YaGs126MiZpBA-UN_r8OUuhs.ttf) format(\'truetype\');\n}\n@font-face {\n font-family: \'Open Sans\';\n font-style: normal;\n font-weight: 400;\n src: url(//fonts.gstatic.com/s/opensans/v15/mem8YaGs126MiZpBA-UFVZ0e.ttf) format(\'truetype\');\n}\n </style>\n <style>\n h1, h2 {\n -webkit-animation-duration: 0.1s;\n -webkit-animation-name: fontfix;\n

Jose Luis Delgadillo
  • 2,348
  • 1
  • 6
  • 16
  • 1
    Google drive is adding some metadata to your csv file. See if this answer helps - [how to read csv from google drive](https://stackoverflow.com/questions/56611698/pandas-how-to-read-csv-file-from-google-drive-public). – AnkurSaxena Jan 25 '21 at 16:29
  • Yea... already try all in those answers and i get the same result... the url seems to point to another file – Emilio Sánchez Vicencio Jan 25 '21 at 18:19

2 Answers2

2

Short answer - you can't put Google Drive URL to pd.read_csv(). You have to download the CSV file and use the actual path to it.

Basically, the Google Drive URL shows you that there is some CSV file. In reality, it's just a website (with HTML content) that shows you some information about the CSV file that they are hosting. That's what you see: <!DOCTYPE html>....

Locally, this works because you use an actual file system path that Pandas can read. If you want to do this with a remote file, you have to fetch the file so it's available in a local file system. In general, you can use wget or curl command, but this is not straightforward to do with Google Drive because you need to be authenticated with your Google account to access the file. There are some ideas on how to do that here and here.

The best way to download a file in Python / Jupyter notebook is to use gdown. You can install it via pip and provide your URL and it will download it for you.

# install gdown in terminal
pip install gdown

# download your file
gdown 'https://drive.google.com/uc?id=1iE1nHPJvglklttBEqX92_Mfg6421CtMq'

Notice the URL that we're providing to gdown.

import pandas as pd
pd.read_csv('/path/to/file.csv')

I created an example notebook for you in Deepnote, you can do the same in local Python repl, in VSCode, in Jupyter notebook, or in Google Colab.

There is a special way for you to connect to Drive from Colab by mounting Drive. More on that here.

Jakub Žitný
  • 962
  • 1
  • 9
  • 38
0

I faced a similar issue and was left scratching my head. Just go to your sharing tab on google drive and select on share(link) and make sure you give access to everyone with the link not only to a certain organisation. This assisted me.

P.S. the above solution is for reading data from google drive on any platfrom (local or online).

The code for importing:

url='url-link'
file_id=url.split('/')[-2]
dwn_url='https://drive.google.com/uc?id=' + file_id
data = pd.read_csv(dwn_url,error_bad_lines=False)