2

I have the following code which I got from this tutorial:

from bs4 import BeautifulSoup
import requests
req=requests.get("http://www.aflcio.org/Legislation-and-Politics/Legislative-Alerts")
data=req.text
soup=BeautifulSoup(data)
letters=soup.find_all("div",class_="ec_statements")
print(letters)

I am getting the following error:

Traceback (most recent call last):
  File ".\scr3.py", line 7, in <module>
    print(letters)
  File "C:\Users\adi\AppData\Local\Programs\Python\Python35\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 7787: character maps to <undefined>

If I try to apply an encode('utf-8') method to the letters object, I get an attribute error saying "ResultSet object has no attribute encode".

Anyone knows a workaround to print the letters object? I am using Python 3.5 and BeautifulSoup 4 on Windows 7.

multigoodverse
  • 7,638
  • 19
  • 64
  • 106

2 Answers2

2

Answering my own question.

I was using the Windows command line to execute the Python script. It turns out that the command line was not being able to handle the printed output. I realized that when I sent the output to a text file:

python script.py > text.txt

That didn't throw an error. Alternatively, if I still want to display the output in the command line, I can first set the utf-8 for it:

chcp 65001 

And then execute the script.

multigoodverse
  • 7,638
  • 19
  • 64
  • 106
-1

Because find_all() returns a set of results. Use a for loop to iterate through each returned element:

from bs4 import BeautifulSoup
import requests
req=requests.get("http://www.aflcio.org/Legislation-and-Politics/Legislative-Alerts")
data=req.text
soup=BeautifulSoup(data)
letters=soup.find_all("div",class_="ec_statements")
for letter in letters:
   print(letter.encode('utf-8'))
   print(letter.text)

You can also use .text function, which gives you the text of the found element and handles encoding automatically.

When using .find() you get a single element as a result and can print it out, .findAll() (or usually written in Python as .find_all()) returns a set of elements, which has no function .encode() since it's a Unicode object.

Another possible solution is to write # -*- coding: utf-8 -*- at the beginning of your script.

print(letters) worked for me after writing # -*- coding: utf-8 -*- at the beginning of the script, it also worked for me without it, but I am using Python 2.7

Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
Ivan Bilan
  • 2,379
  • 5
  • 38
  • 58
  • Same UnicodeEncodeError. I also get such an error when I use prettify. Using # -*- coding: utf-8 -*- didn't solve anything. The setdefaultencoding method also doesn't exist in Python 3 because the default is already utf-8 as this answer suggest: http://stackoverflow.com/questions/28127513/attributeerror-module-object-has-no-attribute-setdefaultencoding. I found out the issue was with the Windows command line so I am providing an answer above. – multigoodverse Dec 19 '15 at 10:48
  • If not reload(sys) will be not work! – dsgdfg Dec 22 '15 at 14:14
  • 1
    I've removed `setdefaultencoding()` from your answer as it's a nasty hack, which masks other issues. It should only be used by people who knows what it does – Alastair McCormack Dec 25 '15 at 12:35
  • 1
    `# -*- coding: utf-8 -*-` only affects how non-ASCII characters written into the source code are interpreted. It has no effect on strings read from external sources. – Alastair McCormack Dec 25 '15 at 12:37
  • 1
    You should avoid encoding within print statement to avoid double encoding. Problems printing to the console should be fixed by the environment – Alastair McCormack Dec 25 '15 at 12:39
  • thanks a lot of the tips – Ivan Bilan Dec 25 '15 at 14:23