I'm tryring to download a txt file using python and sockets, but error occurs when I decodes the content I get.
I'm using python3 and running test.py on windows, trying to fetch the content of http://linux.vbird.org/linux_basic/0330regularex/regular_express.txt
python .\test.py linux.vbird.org 80 /linux_basic/0330regularex/regular_express.txt
# this file is named test.py
import socket
import sys
host = sys.argv[1]
port = sys.argv[2]
filename = sys.argv[3]
# creating a socket, using ipv4
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# connecting
s.connect((host, int(port)))
print("Connecting successful!\n")
str = "GET %s HTTP/1.0\r\n\r\n" % filename
s.sendall(str.encode('utf-8'))
while 1:
try:
buf = s.recv(2048)
except socket.error as e:
print("Error receiving data: %s" % e)
sys.exit(1)
if not len(buf):
break
sys.stdout.write(buf.decode('utf-8'))
I expected to get the content of given url,namely, the content of the txt file ,however, the error message is following:
Connecting successful!
Traceback (most recent call last): File ".\test.py", line 22, in sys.stdout.write(buf.decode('utf-8')) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 275: invalid start byte