I am learning webscraping via BeautifulSoup and Python. My first project is to extract certain recipes from cookpad.hu. I was successfully able to extract but now I'm having troubles with actually writing them to a file (csv is all I know how to do), due to this error:
Traceback (most recent call last): File "cookpad_scrape.py", line 24, in f.writerow(about_clean) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)
My code is below. I am using Python 2.7.14 on Ubuntu. A pastebin of the webpage is here, but the webpage itself is this.
I'm assuming it can't write the Hungarian letters? I'm sure there is a terribly simple solution I am overlooking.
import requests
from bs4 import BeautifulSoup
import csv
'''
Tree of page:
<div id="recipe main">
<div id="editor" class="editor">
<div id="about">
<section id="ingredients">
<section id="steps">
'''
#text only: soup.get_text()
page = requests.get('https://cookpad.com/hu/receptek/5040119-parazson-sult-padlizsankrem')
soup = BeautifulSoup(page.text, 'lxml')
f = csv.writer(open('recipes.csv', 'w')) #create and open file in f variable, using 'w' mode
f.writerow(['Recipe 1']) #write top row headings
about = soup.find(id='about')
about_ext = about.p.extract()
about_clean = about_ext.get_text()
f.writerow(about_clean)
ingredients = soup.find(id='ingredients')
ingredients_ext = ingredients.ol.extract()
ingredients_clean = ingredients_ext.find_all(itemprop='ingredients')
#for ingredient in ingredients_clean:
steps = soup.find(id='steps')
steps_p = steps.find_all(itemprop='recipeInstructions')
for step in steps_p:
extracted = step.p.extract()
print(extracted.text)
f.writerow([extracted])
Solution: Run the script using python3, not 2 via python3 my_script.py
New problem: exporting the scrapes gets me good results for the steps, but ingredients and about section has each letter separated by commas.