I'm a python newbie and I'm in trouble with encoding and URLs. My goal is to download a list of URLs in a text file. My script run well, but I've got errors with some URLs that contains some french accents (like éèà etc.).
Here is my code :
#!/usr/bin/env python
# coding: utf8
import urllib.request
import os
import codecs
import io
# Variables settings
URL = ""
finalFileName = ""
listFiles = "fichiers.txt"
nbLines = 0
currentLine = 1
# Open the file
print ("Open the source file...")
file = open(listFiles, "r")
lines = file.readlines()
# Get line numbers
for line in lines:
nbLines += 1
file.close()
# Download the file
print ("Download the " + str(nbLines) + " files started")
# Read the file line per line
for line in lines :
URL = line.replace("\n", "")
finalFileName= os.path.basename(URL)
print ("Download " + finalFileName + " [" + str(currentLine) + "/" + str(nbLines) + "]")
# Download the file
urllib.request.urlretrieve (URL,finalFileName)
# Incremanting count
currentLine += 1
print ("Done")
I've got this error next :
Download racers-saturewood-300x225.jpg [15/993]
Download _81______r-s-oil-top-finish_363.jpg [16/993]
Download traitement_thermo_traite.jpg [17/993]
Download Blanchiment-du-Douglas-exposé-NORD-150x150.jpg [18/993]
Traceback (most recent call last):
File "D:\Bureau\images-site\dlimage.py", line 39, in <module>
urllib.request.urlretrieve (URL,finalFileName)
File "C:\Python34\lib\urllib\request.py", line 186, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 463, in open
response = self._open(req, data)
File "C:\Python34\lib\urllib\request.py", line 481, in _open
'_open', req)
File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 1210, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Python34\lib\urllib\request.py", line 1182, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Python34\lib\http\client.py", line 1088, in request
self._send_request(method, url, body, headers)
File "C:\Python34\lib\http\client.py", line 1116, in _send_request
self.putrequest(method, url, **skips)
File "C:\Python34\lib\http\client.py", line 973, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 65-66: ordinal not in range(128)
I've try some options to not have errors :
URL.encode('utf8')
(refuse to convert caracters, UnicodeEncodeError: 'ascii' codec can't encode characters in position 65-66: ordinal not in range(128))
URL.decode()
(not work)
I'm lost and I don't know how to solve this troubles, can you help me please ?
Thanks Greetings Arthur