3

I'm new to python and I'm trying to achieve the following: Send a get request to an IP of cloudera-Manager which returns a JSON of hosts with the following structure:

{
  "items" : [ {
    "hostId" : "ddcfbea6-8a7c-462c-38f9-0116338e438a",
    "ipAddress" : "1.2.3.4",
    "hostname" : "host.example.com",
    "rackId" : "/rack01",
    "hostUrl" : "http://host.example.com:7180/cmf/hostRedirect/ddcfbea6-8a7c-462c-38f9-0116338e438a"
  }
...
}

The JSON can contain hundreds of elements and I'd like to find all the elements that have the same value of ipAddress entry and print them and their keys and values . How can I achieve this ? I'm sending the get request using the requests module.

Harald Nordgren
  • 11,693
  • 6
  • 41
  • 65
John Doe
  • 159
  • 1
  • 10
  • possibly related: https://stackoverflow.com/questions/17076345/remove-duplicates-from-json-data#17076552 – Gavin Achtemeier Jul 30 '17 at 20:06
  • 1) Does the JSON have a regular structure? Or is is arbitrarily structured? 2) Do you know what IP you are looking for? – cs95 Jul 30 '17 at 20:09
  • 1
    Do you want to group the data by the ip address or are you giving it some address that you want to get all the entries that match it? – Simon Hobbs Jul 30 '17 at 20:17
  • @COLDSPEED - it does have a regular structure , same as the one I posted in the question. I do not know the IP I'm looking for , just searching for duplicates. – John Doe Jul 31 '17 at 06:03
  • @SimonHobbs , I don't want to group it. I prefer scanning the json as is for duplicate ipaddresses and when I find those duplicates , I'd like to print them out including the rest of their keys and values (such as rackid , hosturl etc) and not only the IP address. – John Doe Jul 31 '17 at 06:06

2 Answers2

3

For a JSON object called hosts,

hosts = {
  "items" : [ {
    "hostId" : "ddcfbea6-8a7c-462c-38f9-0116338e438a",
    "ipAddress" : "1.2.3.4",
    "hostname" : "host.example.com",
    "rackId" : "/rack01",
    "hostUrl" : "http://host.example.com:7180/cmf/hostRedirect/ddcfbea6-8a7c-462c-38f9-0116338e438a"
  }
...
}

You can group the items by IP address like this

grouped_items = {}
for item in hosts["items"]:
        ip_address = item["ipAddress"]
        if ip_address in grouped_items:
                grouped_items[ip_address].append(item)
        else:
                grouped_items[ip_address] = [item]
Harald Nordgren
  • 11,693
  • 6
  • 41
  • 65
0

You can create a dictionary that maps IP addresses to the list of the objects. For example (if d is your example dictionary):

ipToObjects = {}

for item in d['items']:
    if 'ipAddress' not in item:
        continue
    ip = item['ipAddress']
    if ip not in ipToObjects:
        ipToObjects[ip] = []
    ipToObject[ip].append(item)

Now if you want to look for duplicates you can just do this:

duplicates = [ ip for ip in ipToObjects.keys() if len(ipToObjects) >1 ]    
for ip in duplicates:
    print(ipToObjects[ip])

Or do similar things according to your needs.