0

I have a best practice question relating to iterating over a JSON file in Python using try/except statements.

I have a JSON file that looks like this (vastly simplified for the question):

"results": [
      {
       "listingId":"1"
       "address":"123 Main st"
       "landsize":"190 m2"
      },
      {
       "listingId":"2"
       "address":"345 North st"
       "state":"California"
      }
  ]

As I stated, this is super simplified, (in my actual problem there are about 30 key value pairs I am interested in, and thousands of records) The challenge is, that even though the keys are pretty consistent (it's always around the same 30), occasionally there will be a missing key/value pair.

If one or two or 10 are missing, I will want the rest of the record to be written out, so my approach at the moment is using a try/catch statement for each key value pair, which seems to strike me as a very inefficient way of checking this, and I am sure there is a better way.

My code looks (kind of) like this (which i am sure is not the best way to do this):

for i in range(len(JSON_data["results"])):
   try:
      print "ListingID=" + JSON_data["results"][i]["listingId"]
   except KeyError:
      print "ListingID is unknown"

   try:
      print "Address=" + JSON_data["results"][i]["address"]
   except KeyError:
      print "Address is unknown"

   try:
      print "landsize=" + JSON_data["results"][i]["landsize"]
   except KeyError:
      print "landsize is unknown"

   try:
      print "state =" + JSON_data["results"][i]["state"]
   except KeyError:
      print "state is unknown"

Any advice appreciated!

Calamari
  • 29
  • 1
  • 9
  • Really seems like a good opportunity to put your list of keys into an actual Python list, then use a `for` loop to iterate over them. This reduces the code inside your example loop to about four lines, with only a single `try/except` block. – larsks Oct 27 '18 at 13:49

3 Answers3

2

You can use the dict.get() method to avoid having to catch an exception:

listing_id = JSON_data["results"][i].get("listingId")

which returns None or a different default, passed in as the second argument. You can also check if the key is present first:

if 'listingId' in JSON_data["results"][i]:
    # the key is present, do something with the value

Next, you want to not use range() here. You would be much better off looping directly over the results list, so you can directly refer to the dictionary without the whole JSON_data["results"][i] prefix each time:

for nesteddict in JSON_data["results"]:
    if 'listingId' in nesteddict:
        listing_id = nesteddict['nesteddict']

Next, rather than hard-code all the keys you check, use a loop over a list of keys:

expected_keys = ['listingId', 'address', 'landsize', ...]

for nesteddict in JSON_data["results"]:
    for key in expected_keys:
        if key not in nesteddict:
            print(key, 'is unknown')
        else:
            value = nesteddict[key]
            print('{} = {}'.format(key, value)

If you don't need to print that a key is missing, then you could also make use of dictionary views, which act as sets. Sets support intersection operations, so you could ask for the intersection between your expected keys and the available keys:

# note, using a set here now
expected_keys = {'listingId', 'address', 'landsize', ...}

for nesteddict in JSON_data["results"]:
    for key in nesteddict.keys() & expected_keys:  # get the intersection
        # each key is guaranteed to be in nesteddict
        value = nesteddict[key]
        print('{} = {}'.format(

This for loop only ever deals with keys both in nesteddict and in expected_keys, nothing more.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Awesome answer, thanks. I have made your initial suggested changes, much cleaner (and probably faster/efficient)... will explore what you're suggesting with dictionary views... I am not at all familiar with these. – Calamari Oct 27 '18 at 14:02
  • (Also, followup question, using the expected_keys list, how do I cater for nested keys?)... I am sure I will figure it out, but just wondered if you had the solution. – Calamari Oct 27 '18 at 14:31
  • That'd usually require more tailored approaches; for a nested object perhaps use `if 'specifickey' in nesteddict:` then handle that specific nested object with a separate set of keys to expect. – Martijn Pieters Oct 27 '18 at 14:38
1

You could loop through the key names too - meaning you only have 1 try/except. Since it’s in a loop, it repeats the same code for each key, changing the key name each cycle.

for i in range(len(JSON_data["results"])):
    for key in ('listingId', 'address', 'landsize', 'state'):
        try:
            print '{}: {}'.format(key, JSON_data["results"][i][key])
        except KeyError:
            print '{} is unknown'.format(key)

If I’m not mistaken, you could also make your code cleaner by iterating over the results directly:

for result in JSON_data['results']:
    ...

And where you write JSON_data['results'][i], change it to simply result.

Note: you mentioned your actual data is much more complex than this. It may make sense to either store the key names externally (or at least somewhere else) if there are many of them. You could create a file of key names and create a list of the names by doing...

with open('key_names.txt', 'r') as f:
    key_names = [line.strip() for line in f]
N Chauhan
  • 3,407
  • 2
  • 7
  • 21
0

Here is the method I would use for iterating over a json object and listing out the values I want. Also please ensure that you're json objects are properly formatted before posting here.

import json


def import_data():
    data = """
    {
"results": [
      {
       "listingId":"1",
       "address":"123 Main st",
       "landsize":"190 m2"
      },
      {
       "listingId":"2",
       "address":"345 North st",
       "state":"California"
      }
  ]
}
"""
    return data

def format_Data():
    data = import_data()
    data = json.loads(data)
    array = []
    print data
    for data_item in data['results']:
        for key, value in data_item.items():
            if key == 'listingId':
                listingId = value
                print ('ListingID= {}').format(listingId)
            elif key == 'address':
                address = value
                print ('Address= {}').format(address)
            elif key == 'landsize':
                landsize = value
                print ('Landsize= {}').format(landsize)
            elif key == 'state':
                state = value
                print ('State= {}').format(state)

Output:

{u'results': [{u'landsize': u'190 m2', u'listingId': u'1', u'address': u'123 Main st'}, {u'state': u'California', u'listingId': u'2', u'address': u'345 North st'}]}
Landsize= 190 m2
ListingID= 1
Address= 123 Main st
State= California
ListingID= 2
Address= 345 North s

t

Sage Hopkins
  • 183
  • 1
  • 3
  • 15