0

I am new to python and was trying to write a script to download a csv file. I am using python 3.6.1. Here's the code

from urllib import request

demo_csv_url = 'http://www.sample-videos.com/csv/Sample-Spreadsheet-100-rows.csv'

def downloadCSV(url):
    response = request.urlopen(url)
    csv = response.read()
    csvStr = str(csv)
    lines = csvStr.split('\\n')
    dest = r'csv.csv'
    fx = open(dest,"w")
    for line in lines:
        fx.write(line + '\n')
    fx.close()


downloadCSV(demo_csv_url)

When I run the script, I get the following error

Traceback (most recent call last):
  File "C:\Users\Vivek\Desktop\py tutorials\download_csv.py", line 23, in <module>
    downloadCSV(demo_csv_url)
  File "C:\Users\Vivek\Desktop\py tutorials\download_csv.py", line 12, in downloadCSV
    response = request.urlopen(url)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 532, in open
    response = meth(req, response)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 570, in error
    return self._call_chain(*args)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

I tried adding more headers like

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}

and then opening the url as response = request.urlopen(url,hdr) But it throws in more errors. Could you please let me know what I am doing wrong here. Thanks

Vvk
  • 13
  • 8

2 Answers2

0

That URL throws a 403 when you visit it directly in the browser, so it seems to be working as intended. If you want to catch 403's use a try/except.

If the content is protected by an Auth header or Cookie, you'll need to figure out what those are and add those to the request.

Blake O'Hare
  • 1,863
  • 12
  • 16
  • Thanks Blake. I never checked the url directly in the browser. I have tried a different url and it works fine. – Vvk Apr 05 '17 at 04:39
0

You need to authenticate to access this data, you need to provide "password", "username" of some sort.

enter image description here

oshaiken
  • 2,593
  • 1
  • 15
  • 25
  • Thanks @oshaiken . I tried a different url that was not throwing 403 error and it worked. – Vvk Apr 05 '17 at 04:41
  • Here is the list of codes for your future reference https://en.wikipedia.org/wiki/List_of_HTTP_status_codes – oshaiken Apr 05 '17 at 14:02