python getting unicode encode error when saving file

Question

i'm trying to get text from a webpage and it makes 'Traceback (most recent call last): File "C:\Users\username\Desktop\Python\parsing.py", line 21, in textFile.write(str(results)) UnicodeEncodeError: 'cp949' codec can't encode character '\xa9' in position 37971: illegal multibyte sequence'

I've searched and tried textFile.write(str(results).decode('utf-8')) and it makes no attribute arror.

import requests
import os
from bs4 import BeautifulSoup

outputFolderName = "output"

currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName

r = requests.get('https://yahoo.com/')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)

try :
    os.mkdir(outputDir)
    print("output directory generated")
except :
    print("using existing directory")

textFile = open(outputDir + '/output.txt', 'w')
textFile.write(str(results))
textFile.close()

Is there any way to convert the codec of str(results) and save it properly??

python version is 3.7.3

Possible duplicate of [How to correctly parse UTF-8 encoded HTML to Unicode strings with BeautifulSoup?](https://stackoverflow.com/questions/20205455/how-to-correctly-parse-utf-8-encoded-html-to-unicode-strings-with-beautifulsoup) — walnut, Sep 10 '19 at 10:41
Where does the cp949 codec come from? Can you post the full stacktrace? — Tom Dalton, Sep 10 '19 at 10:47
@TomDalton Traceback (most recent call last): File "C:\Users\username\Desktop\Python\parsing.py", line 21, in textFile.write(str(results)) UnicodeEncodeError: 'cp949' codec can't encode character '\xa9' in position 37971: illegal multibyte sequence — FlippingFlop, Sep 10 '19 at 11:34
Please include the traceback in the question body (use the "edit" link below the tags). Also: which Python version are you using? The meaning of `str()` has changed significantly from Python 2 to Python 3. — lenz, Sep 10 '19 at 11:38
I get no error with this code. Can you please provide the full code? — Pitto, Sep 10 '19 at 11:48
@Pitto it is the whole code. maybe it is because i put the url 'example.com'. can you try this code again? i've just modified. — FlippingFlop, Sep 10 '19 at 11:58
Possible duplicate of [UnicodeEncodeError: 'cp949' codec can't encode character](https://stackoverflow.com/questions/43821262/unicodeencodeerror-cp949-codec-cant-encode-character) — Tom Dalton, Sep 10 '19 at 12:18
I think this is related to your system default encoding being cp949 instead of e.g. utf-8. As the answer below suggests, explicitly setting the file's encoding to utf8 will probably solve the issue .See https://stackoverflow.com/a/43821283/2372812 for more info. — Tom Dalton, Sep 10 '19 at 12:18
@TomDalton omg!! what a simple way to solve it !! thanks. and thanks to all others commented :) — FlippingFlop, Sep 10 '19 at 12:28

score 1 · Accepted Answer · edited Sep 10 '19 at 12:32

Please specify the encoding like in this example

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import os
from bs4 import BeautifulSoup

outputFolderName = "output"

currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName

r = requests.get('https://yahoo.com')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)

try :
    os.mkdir(outputDir)
    print("output directory generated")
except :
    print("using existing directory")

textFile = open(outputDir + '/output.txt', mode='w', encoding='utf8')
textFile.write(str(results))
textFile.close()

Hi @FlippingFlop! If my answer was useful please don't forget to upvote and / or choose it as answer. Thanks! — Pitto, Sep 15 '19 at 18:53

python getting unicode encode error when saving file

1 Answers1