0

I need to parse a large json file with ijson(unless something is better), I want to loop through all of the product names in the request and print them out. I tried to set this up using this support page. https://pypi.python.org/pypi/ijson/

This is the current output that I get

<addinfourl at 140643118020800 whose fp = <socket._fileobject object at 0x7fea07882850>>
<generator object items at 0x7fea077dc910>
<generator object <genexpr> at 0x7fea077dc960>

My code

import json
import requests
import lxml 
import ijson
import urllib
from urllib import urlopen


request = urlopen('www.jsonurl.com')
objects = ijson.items(request, 'items.name')
products = (o for o in objects if o ['type' == 'name'])
for product in products:
    print product

print request
print objects
print products

Here is a piece of the json data

{"query":"*","sort":"relevance","responseGroup":"base","totalResults":5158058,"start":1,"numItems":10,"items":[{"itemId":7933617,"parentItemId":7933617,"name":"Nordic Ware Heavyweight Scone / Cornbread Pan","msrp":26.97,"salePrice":20.42,"upc":"011172016409","categoryPath":"Home/Kitchen & Dining/Cookware, Bakeware & Tools/Specialty Cookware","shortDescription":"&lt;p&gt;This Nordic Ware Scone Pan is made of a heavyweight cast aluminum. It can be used as a heavyweight scone or cornbread pan, and it is designed to cook your meal evenly and thoroughly. It features a non-stick interior coating for easy release and clean up.&lt;/p&gt;","longDescription":"&lt;b&gt;Nordic Ware Heavyweight Scone/Cornbread Pan:&lt;/b&gt;&lt;ul&gt;&lt;li&gt;Heavyweight cast aluminum&lt;/li&gt;&lt;li&gt;Ideal for scones and cornbread&lt;/li&gt;&lt;li&gt;Eight wedges&lt;/li&gt;&lt;li&gt;Cooks evenly and thoroughly&lt;/li&gt;&lt;li&gt;Non-stick interior coating for easy release and clean-up&lt;/li&gt;&lt;/ul&gt;","thumbnailImage":"http://i5.walmartimages.com/dfw/dce07b8c-c739/k2-_6fb32a28-c090-4377-81d5-e83273124841.v1.jpg","mediumImage":"http://i5.walmartimages.com/dfw/dce07b8c-ddb3/k2-_6f7df9fa-cb2d-4faf-afbc-8fa4185add59.v1.jpg","largeImage":"http://i5.walmartimages.com/dfw/dce07b8c-5bd3/k2-_6635f62a-5e0b-4c4e-a93d-ee85643f7397.v1.jpg","productTrackingUrl":"http://linksynergy.walmart.com/fs-bin/click?id=|LSNID|&offerid=223073.7200&type=14&catid=8&subid=0&hid=7200&tmpid=1082&RD_PARM1=http%253A%252F%252Fwww.walmart.com%252Fip%252FNordicWare-Heavyweight-Scone-Cornbread-Pan%252F7933617%253Faffp1%253DpjiPu5Y7cvNmz4xZOAs5j7QlW2mZPVmc1DR3BvmrkB4%2526affilsrc%253Dapi","standardShipRate":4.97,"marketplace":false,"modelNumber":"1640","productUrl":"http://c.affil.walmart.com/t/api02?l=http%3A%2F%2Fwww.walmart.com%2Fip%2FNordicWare-Heavyweight-Scone-Cornbread-Pan%2F7933617%3Faffp1%3DpjiPu5Y7cvNmz4xZOAs5j7QlW2mZPVmc1DR3BvmrkB4%26affilsrc%3Dapi%26veh%3Daff%26wmlspartner%3Dreadonlyapi","customerRating":"4.7","numReviews":20,"customerRatingImage":"http://i2.walmartimages.com/i/CustRating/4_7.gif","categoryNode":"4044_623679_133020","bundle":false,"stock":"Available","addToCartUrl":"http://c.affil.walmart.com/t/api02?l=http%3A%2F%2Faffil.walmart.com%2Fcart%2FaddToCart%3Fitems%3D7933617%7C1%26affp1%3DpjiPu5Y7cvNmz4xZOAs5j7QlW2mZPVmc1DR3BvmrkB4%26affilsrc%3Dapi%26veh%3Daff%26wmlspartner%3Dreadonlyapi","affiliateAddToCartUrl":"http://linksynergy.walmart.com/fs-bin/click?id=|LSNID|&offerid=223073.7200&type=14&catid=8&subid=0&hid=7200&tmpid=1082&RD_PARM1=http%253A%252F%252Faffil.walmart.com%252Fcart%252FaddToCart%253Fitems%253D7933617%257C1%2526affp1%253DpjiPu5Y7cvNmz4xZOAs5j7QlW2mZPVmc1DR3BvmrkB4%2526affilsrc%253Dapi","giftOptions":
turtle02
  • 603
  • 3
  • 10
  • 17

1 Answers1

1

What you see in our output is:

print request: open connection to an url - this seems correct and not surprising

print objects: as the output tells, it is an generator and you would probably expect list of values. But as objects are really a generator (you asked for this by using ijson) you shall consume the values from it. Typicall you do it by list(objects)

print products: also a generator, but this time as result of list comprehension. As you used () around the expression, you asked for a generator. If you would use [o for o in objects if o ['type' == 'name']], you would get directly the list. The solution is as with objects: consume the values, e.g. by list(products).

Be aware, that once you consume a value (or all of them) from a generator, they are gone as generator maintains its private internal status, which is changing by each call.

For more see the SO question Convert generator object to list for debugging.

Jan Vlcinsky
  • 42,725
  • 12
  • 101
  • 98
  • I do not exactly understand what i need to do. with the example code if i do products = list(object) it does not print anything, but if I print request it prints the entire request. – turtle02 Apr 23 '16 at 21:19
  • @turtle If you have a generator, it does not contain any value, it is only ready to provide something. By calling `list(generator)` you ask the `generator` to retrieve actual values. Having the values, you may print them. Printing a generator you would only see it is some sort of function, but not the values. – Jan Vlcinsky Apr 23 '16 at 22:16
  • Thanks that did fix it, but now I get this error File "/home/python/Desktop/ for event, value in basic_events: File "/usr/local/lib/python2.7/dist-packages/ijson/backends/python.py", line 185, in basic_parse for value in parse_value(lexer): File "/usr/local/lib/python2.7/dist-packages/ijson/backends/python.py", line 108, in parse_value pos, symbol = next(lexer) File "/usr/local/lib/python2.7/dist-packages/ijson/backends/python.py", line 25, in Lexer if type(f.read(0)) == bytetype:AttributeError: 'Response' object has no attribute 'read' – turtle02 Apr 24 '16 at 04:06
  • @turtle02 I would recommend you to download the file first (e.g. to temporary file) and then read by ijson from this file. `ijson.items` expects a file-like object (having `.read()` method what is not always easy for files read via HTTP. You might have missing "http://" prefix in the url. If you do not solve an issue, it would be better to open new question focused on opening url to get functional file-like object. Starting with local file shall be better. – Jan Vlcinsky Apr 24 '16 at 18:55