1

I am working on a code that extracts data from a JSON file here is the JSON file: Google CDN

and here is a sample of JSON code:

{
  "syncToken": "1677578581095",
  "creationTime": "2023-02-28T02:03:01.095938",
  "prefixes": [{
    "ipv4Prefix": "34.80.0.0/15",
    "service": "Google Cloud",
    "scope": "asia-east1"
  }, {
    "ipv4Prefix": "34.137.0.0/16",
    "service": "Google Cloud",
    "scope": "asia-east1"
  }, {
    "ipv4Prefix": "35.185.128.0/19",
    "service": "Google Cloud",
    "scope": "asia-east1"
  }, {
    "ipv6Prefix": "2600:1900:40a0::/44",
    "service": "Google Cloud",
    "scope": "asia-south1"
  },

I know where the problem is but can not fix the issue with solutions on this website and getting another error every time.

This is my code

import json
f = open('cloud.json')
data = json.load(f)
array = []

for i in data['prefixes']:
    array = [i['prefix'] for i in data['ipv4Prefix']]
f_path = (r"ip.txt")
with open (f_path ,'w') as d:
       for lang in array:
        d.write("{}\n".format(lang))
f.close()

Basically I want to extract only ipv4 address but there are some ipv6 address randomly in block that causes this error so I get key error like this: KeyError: 'ipv4Prefix'

I know why I am getting this error so I tried deleting that whole entry with ipv6Prefix so I added this part to my code:

    if data[i]["prefixes"] == "ipv6Prefix":
        data.pop(i)

for this one I get TypeError: unhashable type: 'dict' which is new to me, I also tried this as someone pointed out in another question but it didn't work.

del data[ipv6Prefix]

Now my final code is like this and getting this error: TypeError: list indices must be integers or slices, not str which is understandable.

import json
f = open('cloud.json')
data = json.load(f)
array = []
for i in data['prefixes']:
    if [i]["prefixes"] == ['ipv6Prefix']:
        data.pop(i)
    array = [i['prefix'] for i in data['ipv4Prefix']]
f_path = (r"ip.txt")
with open (f_path ,'w') as d:
       for lang in array:
        d.write("{}\n".format(lang))
f.close()

So how can I delete entries with 'ipv6Prefix' or better to say, ignore them in my for loop?

I found this question but answer does not fit my code at all.

what's the problem with my code?

I tried several methods like del and dict.pop() but still I get error.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • `array = [i['prefix'] for i in data['ipv4Prefix']]` replaces `array` each time through the loop. You probably want to append to `array`, not reassign it. – Barmar Feb 28 '23 at 22:54
  • There is no `data['ipv4Prefix']`. And it's not a list, so why do you want to loop through it? There's also no `prefix` key in any of the dictionaries, so what is `i['prefix']` supposed to be? – Barmar Feb 28 '23 at 22:56
  • I don't think if append helps in this case but let me try it, and about `i['prefix']` you mentioned, if you open the link you'll see around 540 entries which is why I used this, it actually helps to call each entry in a time and then with `data['ipv4Prefix']` I can extract all ipv4 addresses, this is at least what I tried for another json file and worked perfectly, I can share the code if you like. @Barmar – Hosein Nikkhah Feb 28 '23 at 23:02
  • @Barmar Check this [link](https://codeshare.io/PdEwNo), you can see I used same approach in both my codes but this only does not work because of random ipv6prefix objects in it – Hosein Nikkhah Feb 28 '23 at 23:10
  • I checked the link, I don't see `prefix` anywhere. There's just `ipv4Prefix` and `ipv6Prefix`. – Barmar Mar 01 '23 at 15:04
  • it's for another JSON file, code I shared in that link extracts data from AWS however in this code I shared on stack overflow I am trying to extract from Google JSON file – Hosein Nikkhah Mar 01 '23 at 21:17

2 Answers2

1

You have two choices: Look Before You Leap or Easier to Ask Forgiveness than Permission. In short:

  • LBYL: Do an if check to make sure ipv4Prefix exists
  • EAFP: Assume that ipv4Prefix exists but catch the exception (a KeyError in this case)

Here is some code that demonstrates both approaches. It does not include writing out the results.

import json


def lbyl(data: dict):
    """Look before you leap"""
    ipv4s = []

    for prefix in data["prefixes"]:
        # Ensure that "ipv4Prefix" exists
        if "ipv4Prefix" in prefix:
            ipv4s.append(prefix["ipv4Prefix"])
    return ipv4s


def eafp(data: dict):
    """Easier to Ask Forgiveness than Permission"""
    ipv4s = []

    for prefix in data["prefixes"]:
        try:
            ipv4s.append(prefix["ipv4Prefix"])
        except KeyError:
            # This happens when "ipv4Prefix" is not in prefix
            pass

    return ipv4s


def get_data(path) -> dict:
    with open(path) as f:
        return json.load(f)


if __name__ == "__main__":
    data = get_data("cloud.json")
    print(lbyl(data))
    print(eafp(data))

Which style to use is subjective. Python has a reputation for preferring EAFP, but I prefer to use LYBL if errors are expected as part of normal operation. In your case you know that some objects will not have ipv4Prefix, so I contend that LBYL is more suitable here.

Eric Grunzke
  • 1,487
  • 15
  • 21
  • Both approaches are amazing didn't know LBYL is that handy – Hosein Nikkhah Mar 01 '23 at 21:36
  • 1
    IMHO you should use LBYL when the difference is expected, while EAFP is for unexpected missing data. Python also has `dict.get()`, which can be a simple form of LBYL. – Barmar Mar 01 '23 at 21:37
1

So how can I delete entries with 'ipv6Prefix' or better to say, ignore them in my for loop?

You can skip/ignore prefixes containing ipv6Prefix with if...continue:

# import json
# with open('cloud.json') as f: data = json.load(f) ## safer than f=open...

with open ("ip.txt" ,'w') as d:
    for prefix_i in data['prefixes']:
        # if 'ipv6Prefix' not in prefix_i: d.write("{prefix_i}\n") ## OR
        if 'ipv6Prefix' in prefix_i: continue
        d.write("{}\n".format(prefix_i))
    ## list-comprehension INSTEAD OF for-loop:
    # d.write('\n'.join(str(p) for p in data['prefixes'] if 'ipv6Prefix' not in p)) 

You can write only prefixes containing ipv4Prefix with if 'ipv4Prefix' in...

with open ("ip.txt" ,'w') as d:
    for prefix_i in data['prefixes']:
        if 'ipv4Prefix' in prefix_i: d.write("{}\n".format(prefix_i))

You can alter data itself to omit prefixes containing ipv6Prefix with list comprehension:

data['prefixes'] = [p for p in data['prefixes'] if 'ipv6Prefix' not in p]

You can save a list of prefixes containing ipv4Prefix as JSON with json.dump:

## to just save the list as a variable:
# ipv4Prefixes = [p for p in data['prefixes'] if 'ipv4Prefix' in p]

with open('ipv4Prefixes.json', w) as f:
    json.dump([p for p in data['prefixes'] if 'ipv4Prefix' in p], f)


getting this error: TypeError: list indices must be integers or slices, not str

That's probably due to the if [i]["prefixes"] == ['ipv6Prefix']: line; [i] is a list with just a single item [i, which is a dictionary], so [i]["prefixes"] just doesn't make any sense. You can use if 'ipv6Prefix' in i["prefixes"] instead, but there are more issues with what you're trying to accomplish in that block [I'll explain in the next section].


# for i in data['prefixes']...
        data.pop(i)

The .pop method only takes an integer as input [which has to be the index of the item you want to remove from that list], but i is a copy of a dictionary inside data['prefixes'], so .pop(i) would raise an error if there's an attempt to execute it.

You could loop through enumerate(data['prefixes'])(instead of just data['prefixes']) to keep track of the index associated i, but keep in mind that looping through a list to pop multiple items [from that same list] is NOT advisable at all. For example, if you pop the second item from the list [index=1], then the indices of all items after it will decrease by one; so if you next need to pop what was originally the 5th item in the list, enumerate will tell you that its index is 4, but it actually became 3 after executing .pop(1)...

You could loop through the list in reverse as below (but isn't the list-comprehension approach I suggested before simpler?)

for pi, p in enumerate(reversed(data['prefixes']), 1-len(data['prefixes'])):
    if 'ipv6Prefix' in p["prefixes"]: data['prefixes'].pop(pi)

Btw, instead of applying reversed, you can also use slicing like data['prefixes'][::-1]. I just thought using the function is better for readability because it makes it very obvious what it's doing.


    if data[i]["prefixes"] == "ipv6Prefix":

for this one I get TypeError: unhashable type: 'dict' which is new to me

i is a dictionary (which is unhashable, as the error message said), and therefore cannot be used as a key the way ....data[i]... is trying to.


so I get key error like this: KeyError: 'ipv4Prefix'

probably from the data['ipv4Prefix'] bit in the array = [i['prefix'] for i in data['ipv4Prefix']], because data does not have a key ipv4Prefix; some is in for i in data['prefixes'] might, but there is no point in using if 'ipv4Prefix' in i: del i because i is a copy of an item in the list being looped though.

You can try using .remove like data['prefixes'].remove(i) [instead of del i], but I don't think that would be very efficient. List comprehension is definitely my preferred method in this case [and also probably considered the most "pythonic" approach here].

Driftr95
  • 4,572
  • 2
  • 9
  • 21