I need to only return values higher than 1 in a function counting integer duplicates in a dict

Question

This is the code I have so far.

def find_duplicate_integers(arg):
     stats = {}
     for i in arg:
         if i > 1:
            if i in stats:
                 stats[i] += 1
            else:
                 stats[i] = 1
     return stats

This is the result I want

>>> find_duplicate_integers([1, 1, 3, 2, 3, 1, 0])
{1: 3, 3: 2}

But this is the result I get

>>> find_duplicate_integers([1, 1, 3, 2, 3, 1, 0])
{2: 1, 3: 2}

I apologize if this is due to a basic mistake, but I cannot figure out how to make this work. Any help would be greatly appreciated!

The result you want and how you're describing it are in conflict. It looks like you want to return the count of duplicate integers whose count > 1 not their value. To do that you need to remove the `if i > 1` and process the resulting stats map after your for loop completes. — Kurt Stutsman, Mar 10 '16 at 04:43
Your check for `i > 1` is meaningless here as `i` is the value; _not_ the count of duplicates. — Selcuk, Mar 10 '16 at 04:43
But I want the values that are less than 1 to be excluded from the count completely. — Jakob L, Mar 10 '16 at 04:52
If you say higher than 1, it means 1 is excluded. If you say not less than 1, it means 1 is included. Clarify your thoughts. Do you want to include or exclude 1? The code says to exclude it: `if i > 1`. That means `i` must be strictly greater than 1 to be counted. It's crystal clear. Now, regarding the entry for 2, it sounds like you want it excluded because there's only one of them, is that correct? If so, then you need to eliminate entries with a count of 1 (or avoid adding them in the first place). — Tom Karzes, Mar 10 '16 at 04:56
Simply add every element to the dict and then remove the keys which u dont want ...as simple as that... — Dark Matter, Mar 10 '16 at 04:56
@KINGJAL Then why do you expect `1: 3` in your desired output? — Selcuk, Mar 10 '16 at 05:04
Maybe I explained it wrong. I do want it to consider all integers in the list, but to not return the count for the integers that appear less than twice. — Jakob L, Mar 10 '16 at 05:22

Felix · Answer 1 · 2016-03-10T05:56:29.910

3

you can do that in one line actually:

def find_duplicate_integers(arg):
    return {i: arg.count(i) for i in set(arg) if arg.count(i) > 1}

if you care about runtime, there might be faster ways to do it.

EDIT:

If you need it to be really fast, you can do it like this:

from collections import defaultdict
from random import SystemRandom
from timeit import Timer


def find_duplicate_integers3(arg):
    d = defaultdict(lambda: 0)
    for i in arg:
        d[i] += 1
    return {k: v for k, v in d.items() if v > 1}

rdev = SystemRandom()
numberList = [rdev.randint(0, 10 ** 3) for _ in range(1000)]

t1 = Timer(lambda: find_duplicate_integers3(numberList))  # Mine
t2 = Timer(lambda: find_duplicate_integers1(numberList))  # Goodies's
print(t1.timeit(number=1000))  # => 0.42611347176268827
print(t2.timeit(number=1000))  # => 1.0357027557108174

EDIT2:

As donkopotamus pointed out, there's an even better (and faster) way to do it: collections.Counter

edited Mar 10 '16 at 05:56

answered Mar 10 '16 at 04:56

Felix

6,131
4
24
44

could you explain the colon `i:` in the list comprehension? – xvan Mar 10 '16 at 05:17
1

This is a dictionary comprehension. A dictionary contains key-value pairs that are separated by a colon in the dictionary comprehension. Good explanation can be found here: http://stackoverflow.com/a/14507637/3594526 – Felix Mar 10 '16 at 05:21
Nice! What goes around comes around :) – Goodies Mar 10 '16 at 06:03

score 1 · Accepted Answer · answered Mar 10 '16 at 05:07

If you want to know what is wrong with your code, look at the following change that I've made to your code.

def find_duplicate_integers(arg):
    stats = {}
    res  = {}
    for i in arg:
        if i in stats:
            stats[i] += 1
            res[i] = stats[i]
        else:
            stats[i] = 1
    return res

print find_duplicate_integers([1, 1, 3, 2, 3, 1, 0])

First of all, you were not even looking at because you put the condition to check only those integers which are greater than 1

if i > 1

This condition is not required (according to what i understand from your requirements).

Next I have created a different list just to store those vars which have value more than 1.

I stress, this is not the best way to solve this problem. I'm just trying to point out what was wrong in your code that gave you the results you were getting.

score 1 · Answer 3 · answered Mar 10 '16 at 05:40

You can do this very easily using collections.Counter in pythons standard library

def find_duplicate_integers(arg):
    return {k: v for k, v in collections.Counter(arg).items() if v > 1}

Then

>>> find_duplicate_integers([1, 1, 3, 2, 3, 1, 0])
{1: 3, 3: 2}

Goodies · Answer 4 · 2016-03-10T05:08:44.230

Use groupby from itertools.

from itertools import groupby
def find_duplicate_integers(numberlist, minimum=1):
    repeats = dict([(a, sum(1 for _ in b)) for a, b in groupby(sorted(numberlist))])
    return dict((a, b) for a, b in repeats.items() if b > minimum)

print(find_duplicate_integers([1, 1, 3, 2, 3, 1, 0]))  # => {1: 3, 3: 2}
print(find_duplicate_integers([1, 1, 3, 2, 3, 1, 0], minimum=3))  # => {1: 3}

Comparing to @caenyon's solution. My fuction is #1 his is #2.

from itertools import groupby
from random import SystemRandom
from timeit import Timer
rdev = SystemRandom()
numberList = [rdev.randint(0, 10**3) for _ in range(1000)]

t1 = Timer(lambda: find_duplicate_integers1(numberList))  # Mine
t2 = Timer(lambda: find_duplicate_integers2(numberList))  # caenyon's
print(t1.timeit(number=1000))  # => 0.7377041084807044
print(t2.timeit(number=1000))  # => 16.82846828367938

It gets progressively slower as the size increases.

wow, the time difference is bigger than i thought it would be... but like I said, I didn't care about runtime in my first post. I thought about making it faster and I found a solution, see the edit in my post ;) — Felix, Mar 10 '16 at 05:53

I need to only return values higher than 1 in a function counting integer duplicates in a dict

4 Answers4