0

This is the code I currently have

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36'}
r = requests.get("http://www.google.com", headers=headers)
page_text = r.text
soup = BeautifulSoup(page_text, 'html.parser')
print(soup.prettify())

In theory it should send a request to google, get the text back and use beautifulsoup's method of prettify()

Here's their example code (from http://www.crummy.com/software/BeautifulSoup/bs4/doc/#getting-help)

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())

Everytime I run this code I get the codec error. Here's a screenshot of the exact error

enter image description here

FOUND A SOLUTION

The solution is instead of using print() to use this print method from a stack exchange member.

def uprint(*objects, sep=' ', end='\n', file=sys.stdout):
    enc = file.encoding
    if enc == 'UTF-8':
        print(*objects, sep=sep, end=end, file=file)
    else:
        f = lambda obj: str(obj).encode(enc, errors='backslashreplace').decode(enc)
        print(*map(f, objects), sep=sep, end=end, file=file)
Keatinge
  • 4,330
  • 6
  • 25
  • 44

1 Answers1

0

this happens when your terminal/powershell can't print out whatever character it's receiving from BeautifulSoup. two ways to fix it, first better than the second:

  1. as referenced in PEP 0263, you can declare what encoding python should use by typing # coding=<encoding name> or # -*- coding: <encoding name> -*- where you would put the shebang line.
  2. not the recommended method - at the beginning of your python script,

    import sys
    reload(sys)
    sys.setdefaultencoding('utf8') # or whichever one you want to use.
    

    this is the not recommended method because it's really kind of a misuse of the sys module, but works in a pinch if you are writing a program that isn't terribly complex.

n1c9
  • 2,662
  • 3
  • 32
  • 52
  • well, what error is it giving you when you use either method? – n1c9 Mar 26 '16 at 00:35
  • First I added #coding=utf8 to the top of my program, same error. http://i.imgur.com/vppHB5A.png – Keatinge Mar 26 '16 at 00:37
  • try the second method – n1c9 Mar 26 '16 at 00:38
  • 1
    Then I tried the sys part. I get this problem "NameError: name 'reload' is not defined. http://i.imgur.com/svNj5lc.png – Keatinge Mar 26 '16 at 00:39
  • after a quick check that error comes up in python3. try `from imp import reload` – n1c9 Mar 26 '16 at 00:45
  • Okay so now I'm getting a new different error. 'module sys has no attribute 'setdefaultencoding' http://i.imgur.com/iDuuqr7.png – Keatinge Mar 26 '16 at 00:49
  • 1
    Maybe i'm doing this in an overly confusing way. All I want to do is download a website and parse it with beautifulsoup, clearly I must be doing something majorly wrong because I'm sure this is very common. – Keatinge Mar 26 '16 at 00:50
  • I'd love for you to tell me how PEP 0263 is "completely incorrect.", @PadraicCunningham – n1c9 Mar 26 '16 at 02:04