1

I have a list of dictionaries and the format looks like this:

dict = [{
  "users": {
    "user_a": [{
      "email": [ "aaa1@email.com", "aaa2@email.com" ]
    }],
    "user_b": [{ 
      "email": [ "bbb1@email.com" ]
    }]
  },
  "class": "class_A"
},
{
  "users": {
    "user_d": [{
      "email": [ "ddd1@email.com" ]
    }],
    "user_c": [{ 
      "email": [ "aaa1@email.com", "ccc@email.com" ]
    }]
  },
  "class": "class_B"
}]

I want to find the key (user name) whose value contains email address 'aaa1@email.com' as an example, so the result would be:

class_A, user_a
class_B, user_c

I was trying to this way:

for key, value in enumerate(dict):
  if key =="users":
    if value in "aaa1":

but I'm lost from here. How can I get the keys by values?

Would appreciate your help.

Brooke
  • 435
  • 1
  • 5
  • 20
  • Do you want only the first user name that comes up, or do you want a list of matching users from the entire list of dictionaries? – luther Dec 11 '19 at 18:20
  • ``"user_a": [{ "email": [ "aaa1@email.com", "aaa2@email.com" ] }]`` do you want this as a list? – Nasif Imtiaz Ohi Dec 11 '19 at 18:21
  • 3
    This data structure is extremely awkward to work with and any email lookups will be slow. What's the point of each user having a single-element list to store an object with the `"email"` key? `"users"` seems more useful as a list and why not key each `"users"` object by class? The structure prohibits O(1) lookup as it's currently organized anyway so no matter what info you want, it's slow. I can't see the motivation for all these layers of indirection. More info about your use case (are you doing repeated lookups?) would be helpful. – ggorlen Dec 11 '19 at 18:28
  • @luther I want all matching users. – Brooke Dec 11 '19 at 19:28

6 Answers6

1

when you do enumerate(dict) you are not getting the values of the dict, you are telling python to give you the value of the list and the index you are.

so the first loop you'll get:

for index, value in enumerate(my_dict):
    print("index is {}".format(index))
    print("value is {}".format(value))
>>>index is 0
>>>value is {"users": ... }

so you'll need to start looking inside of the value to get your keys

value["users"]["user_d"]

as mentioned on some of the comments, the structure is really awkward to work with. you'd do well to simplify that if you can. remove unnecessary layers if you can.

Arturo
  • 171
  • 1
  • 17
1

assuming you are stuck with the current data representation you can avoid a lot of trouble dealing with how nested it is by using the flatten_data from my answer here this can transform your data structure into a dictionary like this:

{(0, 'class'): 'class_A',
 (0, 'users', 'user_a', 0, 'email', 0): 'aaa1@email.com',
 (0, 'users', 'user_a', 0, 'email', 1): 'aaa2@email.com',
 (0, 'users', 'user_b', 0, 'email', 0): 'bbb1@email.com',
 (1, 'class'): 'class_B',
 (1, 'users', 'user_c', 0, 'email', 0): 'aaa1@email.com',
 (1, 'users', 'user_c', 0, 'email', 1): 'ccc@email.com',
 (1, 'users', 'user_d', 0, 'email', 0): 'ddd1@email.com'}

This is a bit easier to handle since now you are dealing with a key which is a sequence of indices only some of which you care about, and the element is either the class or an email.

The following solution just goes over all fields, skipping "class" only since everything else is an email.

data = [{'users': {'user_a': [{'email': ['aaa1@email.com', 'aaa2@email.com']}], 'user_b': [{'email': ['bbb1@email.com']}]}, 'class': 'class_A'}, {'users': {'user_d': [{'email': ['ddd1@email.com']}], 'user_c': [{'email': ['aaa1@email.com', 'ccc@email.com']}]}, 'class': 'class_B'}]

# traverse and flatten_data are copied from https://stackoverflow.com/a/36582214/5827215
def traverse(obj, prev_path = "obj", path_repr = "{}[{!r}]".format):
    if isinstance(obj,dict):
        it = obj.items()
    elif isinstance(obj,list):
        it = enumerate(obj)
    else:
        yield prev_path,obj
        return
    for k,v in it:
        yield from traverse(v, path_repr(prev_path,k), path_repr)

def _tuple_concat(tup, idx):
    return (*tup, idx)   
def flatten_data(obj):
    """converts nested dict and list structure into a flat dictionary with tuple keys
    corresponding to the sequence of indices to reach particular element"""
    return dict(traverse(obj, (), _tuple_concat))


# !! THIS IS FOR YOU

def extract_groups(flattened_data, matching_email):
    for path, elem in flattened_data.items():
        # path will have format like (0, 'users', 'user_b', 0, 'email', 0)
        # elem is an email address

        # skip class mentions, we will retrieve these as needed
        if len(path) == 2 and path[1] == "class":
            continue
        # final element will match the given email?
        if elem == matching_email:
            # unpack useful elements of path
            [cls_idx, _, username, *_] = path
            cls = flattened_data[cls_idx, 'class']
            yield cls, username


new_data = flatten_data(data)
##import pprint
##pprint.pprint(new_data)
print(*extract_groups(new_data, "aaa1@email.com"), sep="\n")

This does work for your sample outputting:

('class_A', 'user_a')
('class_B', 'user_c')

But any extra fields would cause problems since it would visit those thinking it is an email. so the extracting function should be written to rely on consistent structures in the data, using path[2] to refer to the user id may not be stable but there may be another way of writing it, etc.

Tadhg McDonald-Jensen
  • 20,699
  • 5
  • 35
  • 59
1

You can use a list comprehension:

data = [{'users': {'user_a': [{'email': ['aaa1@email.com', 'aaa2@email.com']}], 'user_b': [{'email': ['bbb1@email.com']}]}, 'class': 'class_A'}, {'users': {'user_d': [{'email': ['ddd1@email.com']}], 'user_c': [{'email': ['aaa1@email.com', 'ccc@email.com']}]}, 'class': 'class_B'}]
email = "aaa1@email.com"
result = [[i['class'], j] for i in data for j, k in i['users'].items() if any(email in x['email'] for x in k)]

Output:

[['class_A', 'user_a'], ['class_B', 'user_c']]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
0
def get_users_by_email(data, email):
    results = []
    for record in data:
        for user, details in record["users"].items():
            emails = details[0]["email"]
            if email in emails:
                results.append((record["class"], user))
    return results


print(get_users_by_email(d, "aaa1@email.com"))
# [('class_A', 'user_a'), ('class_B', 'user_c')]

Avoid shadowing built-in names:

dict = {...
RafalS
  • 5,834
  • 1
  • 20
  • 25
0

You can try this:

 for d in dict:
     for key in d['users'].keys():
         if 'aaa1@email.com' in d['users'][key][0]['email']:
             print(d['class'],key)
Nasif Imtiaz Ohi
  • 1,563
  • 5
  • 24
  • 45
0

try

for k, d in dict[0].items():
    if str(d).find('aaa1') != -1:
        print(k)

an alternative, however, would not work for you but maybe it is last for someone else

list(dict.keys())[list(dict.values()).index('value')]

to print on screen

print(list(dict.keys())[list(dict.values()).index('value')])