I'm encoding my CSV_table from scrapping process like this :
with open("Raw_table.csv", 'w',encoding="utf-8") as outfile:
csv_writer = csv.writer(outfile, delimiter=';', quotechar='|', quoting=csv.QUOTE_MINIMAL,)
Usually, when i want to use them i use a csv_parser like this :
def parse_csv(content, delimiter = ';'):
csv_data = []
for line in content.split('\n'):
csv_data.append( [x.strip() for x in line.split( delimiter )] ) # strips spaces also
return csv_data
list_raw=parse_csv(open('Raw_RC.csv','r',encoding="utf-8").read())
It works when i'm scrapping from USA, England website.
Here i have to deal with French, Spanish and German things it gives me such error when trying to read from the csv with parse_csv
csv_writer.writerow([k] + v)
ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128)
How can i fix this ?
Subsidiary questions :
- Should I encode the CSV, scrap the site another way (e.g set BeautifoulSoup differently) otherwise when it's german or french ?
- This encoding problem can be related with all of the
\xa0
i get from scrapping ? I don't think so because i'm able to parse UK,USA cdv whereas there are also full of them.
Every bytes of your time you take to solve this is appreciated ! :)