46

I am trying to loop through a list of objects deleting an element from each object. Each object is a new line. I am trying to then save the new file as is without the element contained within the objects.

{
    "business_id": "fNGIbpazjTRdXgwRY_NIXA",
    "full_address": "1201 Washington Ave\nCarnegie, PA 15106",
    "hours": {
        "Monday": {
            "close": "23:00",
            "open": "11:00"
        },
        "Tuesday": {
            "close": "23:00",
            "open": "11:00"
        },
        "Friday": {
            "close": "23:00",
            "open": "11:00"
        },
        "Wednesday": {
            "close": "23:00",
            "open": "11:00"
        },
        "Thursday": {
            "close": "23:00",
            "open": "11:00"
        },
        "Saturday": {
            "close": "23:00",
            "open": "11:00"
        }
    },
    "open": true,
    "categories": ["Bars", "American (Traditional)", "Nightlife", "Lounges", "Restaurants"],
    "city": "Carnegie",
    "review_count": 7,
    "name": "Rocky's Lounge",
    "neighborhoods": [],
    "longitude": -80.0849416,
    "state": "PA",
    "stars": 4.0,
    "latitude": 40.3964688,
    "attributes": {
        "Alcohol": "full_bar",
        "Noise Level": "average",
        "Music": {
            "dj": false
        },
        "Attire": "casual",
        "Ambience": {
            "romantic": false,
            "intimate": false,
            "touristy": false,
            "hipster": false,
            "divey": false,
            "classy": false,
            "trendy": false,
            "upscale": false,
            "casual": false
        },
        "Good for Kids": true,
        "Wheelchair Accessible": true,
        "Good For Dancing": false,
        "Delivery": false,
        "Dogs Allowed": false,
        "Coat Check": false,
        "Smoking": "no",
        "Accepts Credit Cards": true,
        "Take-out": true,
        "Price Range": 1,
        "Outdoor Seating": false,
        "Takes Reservations": false,
        "Waiter Service": true,
        "Wi-Fi": "free",
        "Caters": false,
        "Good For": {
            "dessert": false,
            "latenight": false,
            "lunch": false,
            "dinner": false,
            "brunch": false,
            "breakfast": false
        },
        "Parking": {
            "garage": false,
            "street": false,
            "validated": false,
            "lot": true,
            "valet": false
        },
        "Has TV": true,
        "Good For Groups": true
    },
    "type": "business"
}

I need to remove the information contained within the hours element, however the information is not always the same. Some contain all the days and some only contain one or two day information.

This is the code I've tried:

import json

with open('data.json') as data_file:
    data = json.load(data_file)
    for element in data: 
        del element['hours']

However, I am getting an error when running the code:

TypeError: 'str' object does not support item deletion
cottontail
  • 10,268
  • 18
  • 50
  • 51
Bradley
  • 922
  • 2
  • 8
  • 24
  • 1
    do you want to delete all the hours key and value? (days,close,open) ot just some field in it? – lc123 Apr 13 '16 at 19:16
  • Hi Ic123, Yes I would like to get rid of all the data inside and including the hours element so it is not visible anymore. I have a list of 20,000 objects which start on a new line each time that I need to try loop through and remove "hours" and everything inside this element for the whole file. Will I need to write the results to a new file output? or should the code edit the current file? Any directions would be appreciated. – Bradley Apr 13 '16 at 19:26
  • Hi @Bradley after you remove the `hour` key like @Apero suggested, you need to write the result to a new file or even to overwrite the old file. otherwise your file will not 'know' your changes – lc123 Apr 13 '16 at 19:37
  • file = open("newfile.json", "w") file.write(data) Something like this doesn't seem to work. Sorry if i'm coming across lazy on this one. I'm literally trying to modify the file to use in Pig as I cannot get the elephant bird jar files to work at all. I can only feed Pig using one indent in the JSON file. I actually haven't a clue how to use Python but have been trying all day to find something that can be used to manipulate my JSON file. Thanks @Ic123 – Bradley Apr 13 '16 at 19:48

2 Answers2

100

Let's assume you want to overwrite the same file:

import json

with open('data.json', 'r') as data_file:
    data = json.load(data_file)

for element in data:
    element.pop('hours', None)

with open('data.json', 'w') as data_file:
    data = json.dump(data, data_file)

dict.pop(<key>, not_found=None) is probably what you where looking for, if I understood your requirements. Because it will remove the hours key if present and will not fail if not present.

However I am not sure I understand why it makes a difference to you whether the hours key contains some days or not, because you just want to get rid of the whole key/value pair, right?

Now, if you really want to use del instead of pop, here is how you could make your code work:

import json

with open('data.json') as data_file:
    data = json.load(data_file)

for element in data:
    if 'hours' in element:
        del element['hours']

with open('data.json', 'w') as data_file:
    data = json.dump(data, data_file)

If you want to write it to another file, just change the filename in the second open statement.

I had to change the indentation, as you might have noticed, so that the file has been closed during the data cleanup phase and can be overwritten at the end.

with is what is called a context manager, whatever it provides (here the data_file file descriptor) is available only within that context. It means that as soon as the indentation of the with block ends, the file gets closed and the context ends, along with the file descriptor which becomes invalid/obsolete.

Without doing this, you wouldn't be able to open the file in write mode and get a new file descriptor to write into.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
DevLounge
  • 8,313
  • 3
  • 31
  • 44
  • Is storing the result of `json.dump()` a mistake? [AFAIK](https://docs.python.org/2/library/json.html#json.dump), `json.dump()` does not have a documented return value. – Robᵩ Apr 13 '16 at 20:21
  • what would you need it to return? its result is simply the file it wrote, no? – DevLounge Apr 13 '16 at 20:22
  • it said RuntimeError: dictionary changed size during iteration – Kardi Teknomo Apr 24 '19 at 08:25
  • if you used del element[key] in an iterable for loop than you get an error that the structure has changed – Golden Lion Feb 05 '21 at 15:20
  • I delete a key in a dict which represents an item of a list, so the list size does not change during iteration. List iterator yields each item (the one we delete a key from), so the list size does NOT change during iteration. – DevLounge Feb 24 '21 at 23:45
0

If the data is a dictionary as is the case in the OP (you can verify by print(type(data)) which shows <class 'dict'>), there's no need for a for-loop, a single pop() call (or del statement) does the job. The error is triggered because for element in data: ... iterates over the other keys in the dictionary (which are strings) and strings don't support item deletion.

import json

with open('data.json') as f:
    data1 = json.load(f)
    data2 = json.load(f)

print(type(data1))        # <class 'dict'>
data1.pop('hours', None)  # remove the 'hours' key and its corresponding value
del data2['hours']        # del can be used as well

data1 == data2            # True

If you need to loop through a dictionary to remove keys (perhaps remove keys dependent on its values), then to prevent a RuntimeError: dictionary changed size during iteration, make a copy of the keys first and iterate over the copy. An example is as follows.

data = {
    'business_id': 'fNGIbpazjTRdXgwRY_NIXA',
    'full_address': '1201 Washington Ave Carnegie, PA 15106',
    'attributes': None,
    'open': True,
    'review_count': 7,
    'name': "Rocky's Lounge",
    'stars': 4.0,
    'type': 'business'
}

for k, v in data.items():
    if v is None:
        data.pop(k)        # <---- RuntimeError: dictionary changed size during iteration


for k in list(data):
#        ^^^^^^^^^^   <---- make a copy here
    if data[k] is None:
        data.pop(k)        # <--- OK
cottontail
  • 10,268
  • 18
  • 50
  • 51