0

I am fairly new to Python and Scrapy but have been able to some basic web scraping. However, I am having issues import JSON data. I have posted the traceback after the code.

Here is the code that I am using.

from scrapy.spider import Spider
import json

class myspider(Spider):
    name = "jsontest"
    allowed_domains = ["data.sportsillustrated.cnn.com"]
    start_urls = ['http://data.sportsillustrated.cnn.com/jsonp/basketball/nba/gameflash/2012/11/20/32128_playbyplay.json']

    def parse(self, response):
        jsonresponse = json.loads(response.body_as_unicode())
        print jsonresponse 

Traceback (most recent call last): File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 1201, in mainLoop self.runUntilCurrent() File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 824, in runUntilCurrent call.func(*call.args, **call.kw) File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 382, in callback self._startRunCallbacks(result) File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 490, in _startRunCallbacks self._runCallbacks() --- --- File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File "jsontest\spiders\jsontest.py", line 10, in parse jsonresponse = json.loads(response.body_as_unicode()) File "C:\Python27\lib\json__init__.py", line 338, in loads return _default_decoder.decode(s) File "C:\Python27\lib\json\decoder.py", line 365, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Python27\lib\json\decoder.py", line 383, in raw_decode raise ValueError("No JSON object could be decoded") exceptions.ValueError: No JSON object could be decoded

Neil
  • 911
  • 7
  • 25
  • Can you show us full tracback? – Syed Habib M Jan 28 '14 at 04:38
  • Running `curl --head http://data.sportsillustrated.cnn.com` returns a 403. Clicking on the link in the browser returns a blank page, so I don't think it's a user agent issue. You may need to adjust your parameters. – verbsintransit Jan 28 '14 at 04:38
  • I have added the traceback. If you go to the url, you will notice that that JSON is wrapped in a callbackWrapper. Could that be causing me issues? – Neil Jan 28 '14 at 04:42

1 Answers1

4

its a jsonp response, see What is JSONP all about? here's one good way to parse it:

>>> jsonp = response.body
>>> j = jsonp[ jsonp.index("(") + 1 : jsonp.rindex(")") ]
>>> json.loads(j)

also see this codereview link

Community
  • 1
  • 1
Guy Gavriely
  • 11,228
  • 6
  • 27
  • 42
  • Thank you so much. After I posted I realized that it might have something to do with the callbackWrapper. I really appreciate your help. – Neil Jan 28 '14 at 04:48