csv file not workinf properly

Question

so I created a simple code to read a csv file in python 3.0 using pandas

import pandas as pd

df = pd.read_csv('https://www.goodreads.com/review_porter/export/153331182/goodreads_export.csv', on_bad_lines= 'skip')

print(df)

and instead of the csv file i ended with this:
<!DOCTYPE html>
0                                               <html>
1                                               <head>
2                               <title>Sign Up</title>
3    <meta content='telephone=no' name='format-dete...
4    <link href='https://www.goodreads.com/user/sig...
..                                                 ...
255                                                  }
256                                              //]]>
257                                          </script>
258                                            </html>
259  <!-- This is a random-length HTML comment: xme...

[260 rows x 1 columns]

can someone help me understand why in this particular case is not working, becouse i tryed another .csv and worked just fine. The site that i use is https://www.goodreads.com/ and the .csv file is from the export section.

Have you visited that URL? It doesn't open a CSV file for me. It opens the exact HTML page that you're seeing. Maybe you need to provide some authentication/authorization (a header value, a cookie value, etc.) as part of the web request? — David, Jul 12 '22 at 12:23
https://www.goodreads.com/review_porter/export/153331182/goodreads_export.csv, that s the url that i used in the code and is opening a .csv file. — Robert Sofianu, Jul 12 '22 at 12:28
@RobertSofianu It's opening a CSV file for you in the browser because you've logged in to your GoodReads account. Your Python script will not have logged in. — AKX, Jul 12 '22 at 12:29
https://stackoverflow.com/questions/33039327/handling-http-authentication-when-accesing-remote-urls-via-pandas this may help. — JayPeerachai, Jul 12 '22 at 12:30

score 0 · Accepted Answer · answered Jul 12 '22 at 12:38

0

Thats because that link need you to be authenticated before you can access the csv file. Since you have not passed any authentication it just read the sign up page and displaying the HTML format.

You can try this:

import requests
response = requests.get(url, auth=(username, password), verify=False)

Even if you download the csv file, it should work too.

answered Jul 12 '22 at 12:38

PalMaxone

32
8

I tried this and now is giving me another error: InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.goodreads.com'. Adding certificate verification is strongly advised. – Robert Sofianu Jul 12 '22 at 13:05
Just for security you need to pass the certificate link/path to ```verify = "yourlink certificate link/path"``` while sending request. – PalMaxone Jul 13 '22 at 04:39
refer this link for more detail to give you certificate path: https://github.com/4teamwork/ftw.linkchecker/issues/57 – PalMaxone Jul 13 '22 at 04:40
or you can disable it too. refer this https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings – PalMaxone Jul 13 '22 at 04:43

csv file not workinf properly

1 Answers1