Python Group by count

Question

Given a dictionary, I need some way to do the following:

In the dictionary, we have names, gender, occupation, and salary. I need to figure out if each name I search in the dictionay, there are no more than 5 other employees that have the same name, gender and occupation. If so, I output it. Otherwise, I remove it.

Any help or resources would be appreciated!

What I researched:

count = Counter(tok['Name'] for tok in input_file)

This counts the number of occurances for name (ie Bob: 2, Amy: 4). However, I need to add the gender and occupation to this as well (ie Bob, M, Salesperson: 2, Amy, F, Manager: 1).

what are the {`key`:`value`} pairs. How have u stored the data in the `dict`? If you are still to attempt this problem, why don't you try out an `Object-Oriented Design`? — tmj, Nov 23 '13 at 20:52
http://stackoverflow.com/questions/20150561/class-or-object-instead-of-dictionaries-in-python-2/20151058#20151058 — tmj, Nov 23 '13 at 20:53
Kind of depends on what your `dict` looks like. Show an example. — roippi, Nov 23 '13 at 20:54
i read it from a csv file so it looks something like this {'Name': 'Bob Billy', 'Gender': 'M', 'Occupation': 'Salesperson', 'Salary': '55000'} {'Name': 'Bob Billy', 'Gender': 'M', 'Occupation': 'Manager', 'Salary': '250000'} etc — Nitrodbz, Nov 23 '13 at 20:59
Btw do you remove the whole dictionary or just the conflicting key,value pairs? — tmj, Nov 23 '13 at 21:02
just the conflicting key,value pair. That part of the question isn't at most important. I just need a way to determine if the count is less than 5 given that there are other rows with the same name, gender and occupation — Nitrodbz, Nov 23 '13 at 21:05
There is a lot of ambiguity in your statement. If you post code of your attempts it might become more clear what you're asking — Ryan Saxe, Nov 23 '13 at 21:05

tmj · Accepted Answer · 2013-11-23T21:45:35.633

To only check if the dictionary has 5 or more (key,value) pairs, in which the name,gender and occupation of employee is same, is quite simple. To remove all such inconsistencies is tricky.

# data = {}
# key = 'UID'
# value = ('Name','Male','Accountant','20000')
# data[key] = value

def consistency(dictionary):

    temp_list_of_values_we_care_about = [(x[0],x[1],x[2]) for x in dictionary.itervalues()]
    temp_dict = {}

    for val in temp_list_of_values_we_care_about:
        if val in temp_dict:
            temp_dict[val] += 1
        else:
            temp_dict[val] = 1

    if max(temp_dict.values()) >=5:
        return False
    else:
        return True

And to actually, get a dictionary with those particular values removed, there are two ways.

Edit and update the original dictionary. (Doing it in-place)
Create a new dictionary and add only those values which satisfy our constraint.

def consistency(dictionary):

    temp_list_of_values_we_care_about = [(x[0],x[1],x[2]) for x in dictionary.itervalues()]
    temp_dict = {}

    for val in temp_list_of_values_we_care_about:
        if val in temp_dict:
            temp_dict[val] += 1
        else:
            temp_dict[val] = 1

    new_dictionary = {}
    for key in dictionary:

        value = dictionary[key]
        temp = (value[0],value[1],value[2])

        if temp_dict[temp] <=5:
            new_dictionary[key] = value

    return new_dictionary

P.S. I have chosen the much easier second way to do it. Choosing the first method will cause a lot of computation overhead, and we certainly would want to avoid that.

The remove option isn't as important to me (bonus feature) but thanks! — Nitrodbz, Nov 23 '13 at 21:38
@Nitrodbz If you feel that the answer is complete, you can accept it. — tmj, Nov 23 '13 at 21:54

Python Group by count

1 Answers1