I have a csv file with some urls; after reading through it, in python with:
import csv
rows = []
with open("links.csv","r", encoding = "utf-8") as c:
csv_reader = csv.reader(c)
for row in csv_reader:
rows.append(row)
It returns a list with all the urls this csv file contains. Then, I try to get an element via its XPATH route with "requests" and "lxml"
import requests
import lxml.html as html
def scraper():
for i in range(len(rows)):
try:
article = requests.get(rows[i][0])
if article.status_code == 200:
artc = article.content.decode("utf-8")
parsed = html.fromstring(artc)
img_url = parsed.xpath(URL_1)
img_url.append(img_links)
else:
raise ValueError(f"Error: {article.status_code!r}")
except ValueError as ve:
print(ve)
Now the problem is, when I run this program, the following error appears:
No connection adapters were found for '\ufeffhttps://detail.1688.com/offer/524898885299.html?spm=a26352.b28411319.offerlist.290.ad1b1e625GXj5E' 'utf-8' codec can't decode byte: invalid start byte
As a note: All these links are from chinese web pages such as 1688 or taobao which makes me think the problem has something to do with encoding. I've tried using "utf-8-sig". This solves the --'\ufeff problem-- but does not solve the --can't decode byte--