1

so I created a simple code to read a csv file in python 3.0 using pandas

import pandas as pd

df = pd.read_csv('https://www.goodreads.com/review_porter/export/153331182/goodreads_export.csv', on_bad_lines= 'skip')

print(df)
and instead of the csv file i ended with this:
<!DOCTYPE html>
0                                               <html>
1                                               <head>
2                               <title>Sign Up</title>
3    <meta content='telephone=no' name='format-dete...
4    <link href='https://www.goodreads.com/user/sig...
..                                                 ...
255                                                  }
256                                              //]]>
257                                          </script>
258                                            </html>
259  <!-- This is a random-length HTML comment: xme...

[260 rows x 1 columns]

can someone help me understand why in this particular case is not working, becouse i tryed another .csv and worked just fine. The site that i use is https://www.goodreads.com/ and the .csv file is from the export section.

  • 3
    Have you visited that URL? It doesn't open a CSV file for me. It opens the exact HTML page that you're seeing. Maybe you need to provide some authentication/authorization (a header value, a cookie value, etc.) as part of the web request? – David Jul 12 '22 at 12:23
  • https://www.goodreads.com/review_porter/export/153331182/goodreads_export.csv, that s the url that i used in the code and is opening a .csv file. – Robert Sofianu Jul 12 '22 at 12:28
  • @RobertSofianu It's opening a CSV file for you in the browser because you've logged in to your GoodReads account. Your Python script will not have logged in. – AKX Jul 12 '22 at 12:29
  • 1
    https://stackoverflow.com/questions/33039327/handling-http-authentication-when-accesing-remote-urls-via-pandas this may help. – JayPeerachai Jul 12 '22 at 12:30

1 Answers1

0

Thats because that link need you to be authenticated before you can access the csv file. Since you have not passed any authentication it just read the sign up page and displaying the HTML format.

You can try this:

import requests
response = requests.get(url, auth=(username, password), verify=False)

Even if you download the csv file, it should work too.

PalMaxone
  • 32
  • 8
  • I tried this and now is giving me another error: InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.goodreads.com'. Adding certificate verification is strongly advised. – Robert Sofianu Jul 12 '22 at 13:05
  • Just for security you need to pass the certificate link/path to ```verify = "yourlink certificate link/path"``` while sending request. – PalMaxone Jul 13 '22 at 04:39
  • refer this link for more detail to give you certificate path: https://github.com/4teamwork/ftw.linkchecker/issues/57 – PalMaxone Jul 13 '22 at 04:40
  • or you can disable it too. refer this https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings – PalMaxone Jul 13 '22 at 04:43