How to count the total unique ids in a nested list?

Question

I am having a problem counting the total number of unique ids in a nested list.

Nested list:

[
   [
      {
         "id": "a",
         "label": "Truck",
         "annotation": "vehicle",
         "visible": "No",
         "label2": "Truck",
         "shape": "rectangle",
         "x": 4, 
         "y": 500,
         "height": 200, 
         "width": 300
      },
      {
         "id": "b",
         "label": "Truck",
         "annotation": "vehicle",
         "visible": "No",
         "label2": "Truck",
         "shape": "rectangle",
         "x": 3, 
         "y": 400,
         "height": 250, 
         "width": 360
      },
      ...
   ],
   [
      {
         "id": "a",
         "label": "Truck",
         "annotation": "vehicle",
         "visible": "No",
         "label2": "Truck",
         "shape": "rectangle",
         "x": 4, 
         "y": 500,
         "height": 200, 
         "width": 300
      },
      {
         "id": "b",
         "label": "Truck",
         "annotation": "vehicle",
         "visible": "No",
         "label2": "Truck",
         "shape": "rectangle",
         "x": 3, 
         "y": 400,
         "height": 250, 
         "width": 360
      },
      ...
   ],
   ...
]

Currently, it keeps on printing out the result below, which is not what I want:

id: 1,
label: 1,
annotation: 1,
visible: 1,
label2: 1,
shape: 1,
x: 1, 
y: 1,
height: 1, 
width: 1
...
id: 1,
label: 1,
annotation: 1,
visible: 1,
label2: 1,
shape: 1,
x: 1, 
y: 1,
height: 1, 
width: 1

How can I get this nested list which also contains dictionaries to just count id "a" and "b" once without using pandas?

Output I do want:

Unique id: 2

Code:

import json
import os
import pandas as pd
from itertools import chain

path = 'mypath/json_name.json'
size = os.path.getsize(path)

def func1(data):
   c = {}
   for key,value in data.items():
      try:
         c[key].append(value)
      except KeyError:
         c[key] = [value]
   for key,value in c.items():
      print("{0}:{1}". format(key, len(set(value))))


def totalUniqueId(data):
   for inner_list in data:
      for inner_dict in inner_list:
         func1(inner_dict)

with open('json_name.json') as json_file:
   if size> 13000:
      json_file.seek(0)
      test_data = json.load(json_file)
      totalUniqueId(test_data)

Resources I used:

Don't you just want `unique = set(d['id'] for sublist in test_data for d in sublist)` then `len(unique)`? — Mark, Jan 26 '22 at 22:24
If you only care about the `id`, why are you looping over all the items in the dictionary? — Barmar, Jan 26 '22 at 22:24
You're calling `func1()` separately on each dictionary. There will never be any duplicates in a single dictionary. — Barmar, Jan 26 '22 at 22:25
Hi @Mark, it worked. How can I get your code to work using the traditional nested loop instead of using the set() on the outside? — SL42, Jan 26 '22 at 22:32

martineau · Answer 1 · 2022-01-30T19:36:02.760

Then simplest way would be to put the ids in set and use its length:

import json

with open('json_name.json') as json_file:
    data = json.load(json_file)

unique_ids = set()
for sublist in data:
    for obj in sublist:
        unique_ids.add(obj['id'])

print(f'Unique ids: {len(unique_ids)}')

You could do the same thing with a one-liner which is called a set comprehension:

unique_ids = {obj['id'] for sublist in data for obj in sublist}

score 0 · Answer 2 · answered Jan 26 '22 at 22:31

If I understand what you need correctly. I think one solution would be to store all of your ids in a temporal list with all the ids and then use Counter to count the ocurrences of each unique id in that list.

Something like this.

from collections import Counter
ids = []
for x in l:
    for y in x:
        ids.append(y['id'])
print(Counter(ids))

This is the output you would get if youy run that code with an example of a nested list:

l = [
   [
      {
         "id": "a",
         "label": "Truck",
         "annotation": "vehicle",
      },
      {
         "id": "b",
         "label": "Truck",
         "annotation": "vehicle",
      },
   ],
   [
      {
         "id": "a",
         "label": "Truck",
         "annotation": "vehicle",
      },
      {
         "id": "b",
         "label": "Truck",
         "annotation": "vehicle",
      },
   ],
]

from collections import Counter
ids = []
for x in l:
    for y in x:
        ids.append(y['id'])
print(Counter(ids))

Will get you:

Counter({'a': 2, 'b': 2})

I think ```len(set(ids))``` is closer to what OP is asking – Tobi208 Jan 26 '22 at 22:44 — Tobi208, Jan 26 '22 at 22:44

How to count the total unique ids in a nested list?

2 Answers2