0

I am crawling a particular url from google.com but i get some error

'utf8' codec can't decode byte 0xc3 in position 72: invalid continuation byte

Code:

import re
import os
import MySQLdb
import codecs
import requests
import base64
import random
import gzip
import time
from multiprocessing.pool import Pool
import datetime
import time

import sys
reload(sys)
sys.setdefaultencoding('utf-8')
def proxy_mesh():
    while True:
        try: 

            data = requests.get('google.com')

            print data.text.encode('utf-8')
        except Exception, e:
            print e
            print "Trying again"
            time.sleep(3)
proxy_mesh()

What is the FIX and how to over come this error?

Mounarajan
  • 1,357
  • 5
  • 22
  • 43
  • In other words, you're trying to decode using `utf-8` while the encoding was done differently. – Leb Mar 23 '16 at 01:33
  • Can you give the traceback? This could be occurring implicitly in several places. – ShadowRanger Mar 23 '16 at 01:37
  • @Mounarajan as suggested in the link I provided, you need to use different encoding. Can't tell you which one without more information. – Leb Mar 23 '16 at 01:41

1 Answers1

0

Keep it simple and it works. The data has already been decoded by the requests module.

import requests
data = requests.get('https://www.whoisxmlapi.com/whoisserver/WhoisService?domainName=http://N%E2%94%9CO-RESPONDER@MERCAOLIVRE.COM&outputFormat=json')
print data.text

Since it is a JSON response, you may also want to process it:

import json
print json.loads(data.text)
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251