5

I'm working in python. Is there a way to count how many times values in a dictionary are found with more than one key, and then return a count?

So if for example I had 50 values and I ran a script to do this, I would get a count that would look something like this:

1: 23  
2: 15  
3: 7  
4: 5  

The above would be telling me that 23 values appear in 1 key, 15 values appear in 2 keys, 7 values appear in 3 keys and 5 values appear in 4 keys.

Also, would this question change if there were multiple values per key in my dictionary?

Here is a sample of my dictionary (it's bacteria names):

{'0': ['Pyrobaculum'], '1': ['Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium'], '3': ['Thermoanaerobacter', 'Thermoanaerobacter'], '2': ['Helicobacter', 'Mycobacterium'], '5': ['Thermoanaerobacter', 'Thermoanaerobacter'], '4': ['Helicobacter'], '7': ['Syntrophomonas'], '6': ['Gelria'], '9': ['Campylobacter', 'Campylobacter'], '8': ['Syntrophomonas'], '10': ['Desulfitobacterium', 'Mycobacterium']}

So from this sample, there are 8 unique values, I the ideal feedback I would get be:

1:4
2:3
3:1

So 4 bacteria names are only in one key, 3 bacteria are found in two keys and 1 bacteria is found in three keys.

Jen
  • 1,141
  • 2
  • 11
  • 16
  • The only way to do it is to iterate through the values. No fancy short cuts. – Paul Tomblin Sep 03 '13 at 00:24
  • @PaulTomblin would you mind suggesting a way to iterate through the values? Would it include something like `for value in dictionary.values():`? – Jen Sep 03 '13 at 00:30

3 Answers3

6

So unless I'm reading this wrong you want to know:

  • For each of the values in the original dictionary, how many times does each different count of values occur?
  • In essence what you want is the frequency of the values in the dictionary

I took a less elegant approach that the other answers, but have broken the problem down for you into individual steps:

d = {'0': ['Pyrobaculum'], '1': ['Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium'], '3': ['Thermoanaerobacter', 'Thermoanaerobacter'], '2': ['Helicobacter', 'Mycobacterium'], '5': ['Thermoanaerobacter', 'Thermoanaerobacter'], '4': ['Helicobacter'], '7': ['Syntrophomonas'], '6': ['Gelria'], '9': ['Campylobacter', 'Campylobacter'], '8': ['Syntrophomonas'], '10': ['Desulfitobacterium', 'Mycobacterium']}

# Iterate through and find out how many times each key occurs
vals = {}                       # A dictonary to store how often each value occurs.
for i in d.values():
  for j in set(i):              # Convert to a set to remove duplicates
    vals[j] = 1 + vals.get(j,0) # If we've seen this value iterate the count
                                # Otherwise we get the default of 0 and iterate it
print vals

# Iterate through each possible freqency and find how many values have that count.
counts = {}                     # A dictonary to store the final frequencies.
# We will iterate from 0 (which is a valid count) to the maximum count
for i in range(0,max(vals.values())+1):
    # Find all values that have the current frequency, count them
    #and add them to the frequency dictionary
    counts[i] = len([x for x in vals.values() if x == i])

for key in sorted(counts.keys()):
  if counts[key] > 0:
     print key,":",counts[key]

You can also test this code on codepad.

  • YAY!!!! This worked amazingly! Thank you! As a second thing, is it possible for it not to count duplicates within a key (or can I easily remove these duplicates before the `vals = {}` step? – Jen Sep 03 '13 at 01:25
5

If I understand correctly, you want to count the counts of dictionary values. If the values are countable by collections.Counter, you just need to call Counter on the dictionaries values and then again on the first counter's values. Here is an example using a dictionary where the keys are range(100) and the values are random between 0 and 10:

from collections import Counter
d = dict(enumerate([str(random.randint(0, 10)) for _ in range(100)]))
counter = Counter(d.values())
counts_counter = Counter(counter.values())

EDIT:

After the sample dictionary was added to the question, you need to do the first count in a slightly different way (d is the dictionary in the question):

from collections import Counter
c = Counter()
for v in d.itervalues():
    c.update(set(v))
Counter(c.values())
Paulo Almeida
  • 7,803
  • 28
  • 36
  • Yes! That's what I want, to "count the counts of dictionary values"! The only thing is, I actually have about 5000 keys, and the values are words, is it easy to change what you've posted to reflect this? Thanks so much for posting an answer! – Jen Sep 03 '13 at 00:52
  • @Jen, If you have a dictionary where values are strings, this should work. But I've seen your comment in another answer saying you have lists. That would be different. As 1_CR said, it would be helpful to see a sample of your dictionary. – Paulo Almeida Sep 03 '13 at 00:57
  • Does the dictionary itself even matter? Either a value appears or it doesn't. After a few steps this is reduced to integers anyhow. In fact the keys are even unnecessary. –  Sep 03 '13 at 01:02
  • @LegoStormtroopr, I was answering you, but the question was edited and it's a bit different now. – Paulo Almeida Sep 03 '13 at 01:14
2

You could use a Counter

>>>from collections import Counter
>>>d = dict(((1, 1), (2, 1), (3, 1), (4, 2), (5, 2), (6, 3), (7, 3)))
>>>d
{1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 3, 7: 3}
>>>Counter(d.values())
Counter({1: 3, 2: 2, 3: 2})
iruvar
  • 22,736
  • 7
  • 53
  • 82
  • Thanks for posting! I tried this but I got an error because my values are in lists, would that change how this works? – Jen Sep 03 '13 at 00:49
  • @Jen, Please add a sample part of your dictionary to the original post. – iruvar Sep 03 '13 at 00:51