I've just started to learn Python, but when I want to write a tool to help me download the online book "Learn Vimscript The Hard Way", I have a problem.
This is my code; the version is py3.5
:
#coding: utf-8
import urllib.request
import re
url = 'http://learnvimscriptthehardway.stevelosh.com'
name = '/chapters/16.html'
while(len(name) != 0):
url1 = url + name
print(url1)
response = urllib.request.urlopen(url1)
vim = response.read().decode('utf-8')
address = "/Users/zhangzhimin/learnvimthehardway/" + name[-2:] + ".html"
with open(address, "w") as f:
f.write(vim)
print("%s finish" % name)
x = re.findall('''<a class="next" href="(.+?)"''', vim)
name = x[0]
This is the result:
:!python3 test.py
http://learnvimscriptthehardway.stevelosh.com/chapters/16.html
/chapters/16.html finish
http://learnvimscriptthehardway.stevelosh.com/chapters/17.html
Traceback (most recent call last):
File "test.py", line 11, in <module>
vim = response.read().decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I don't know why this happens: I can download chapter 16 and decode it but I can't do the same thing for chapter 17.