0

My Code

import requests
from bs4 import BeautifulSoup

url = "http://www.quikr.com/jobs/direct-hiring-for-fresher-b.tech-diploma-iti-for-maruti-suzuki-gurgaon-W0QQAdIdZ293462666"
encode = 'utf-8'
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "close",
    "DNT": "1",
    "Upgrade-Insecure-Requests": "1"
}
response = requests.get(url, headers=headers)
encodeData = response.text.encode(encode)
soup = BeautifulSoup(encodeData)
print soup.prettify()

I am trying to scrap a html page, this is very basic code. But still I am getting error when I use prettify()

error is

UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 7

jwodder
  • 54,758
  • 12
  • 108
  • 124
Firdoesh
  • 73
  • 1
  • 2
  • 9
  • 1
    I tried to run your code and it works completely fine(py2, py3 both) – Arpit Solanki Jun 11 '17 at 19:48
  • I don't know, I am still getting the error. However after removing prettify() it working fine. – Firdoesh Jun 11 '17 at 19:53
  • can you tell which platform you are running? – Arpit Solanki Jun 11 '17 at 19:55
  • Windows Seems like everything is working except prettify() – Firdoesh Jun 11 '17 at 20:03
  • sorry i can't help you then, i am a linux guy and don't have windows. – Arpit Solanki Jun 11 '17 at 20:04
  • Setting the encoding to utf-8 might work. https://stackoverflow.com/questions/32382686/unicodeencodeerror-charmap-codec-cant-encode-character-u2010-character-m – pmuntima Jun 11 '17 at 20:17
  • What version of python are you using? I've tested your code on Python 2.7.11 and 3.5.2 and it works OK. – mx0 Jun 11 '17 at 20:23
  • If it really is just `prettify` that's giving you grief then replace that line with a construction that calls `BeautifulSoup` with something like this: `BeautifulSoup(response.text.encode(encode, 'replace')`. Bizarre characters will be cheerfully ignored. – Bill Bell Jun 11 '17 at 20:51

1 Answers1

2

This is a common problem. The issue probably isn't with your code, but with whatever console you're printing to. Beautifulsoup uses a unicode encoding, which a lot of editors don't always play nice with (for example, i get this error a lot when I print a soup in Sublime Text). Encoding the string to another format (UTF-8, ascii) should do the trick.

print soup.prettify().encode('utf-8')

I haven't tested, that may just fix it for you.

Nolan Conaway
  • 2,639
  • 1
  • 26
  • 42