0

I'm trying to find duplicates in a list. I want to preserve the values and insert them into a tuple with their number of occurrences.

For example:

list_of_n = [2, 3, 5, 5, 5, 6, 2]
occurance_of_n = zip(set(list_of_n), [list_of_n.count(n) for n in set(list_of_n)])

[(2, 2), (3, 1), (5, 3), (6, 1)]

This works fine with small sets. My question is: as list_of_n gets larger, will I have to worry about arg1 and arg2 in zip(arg1, arg2) not lining up correctly if they're the same set?

I.e. Is there a conceivable future where I call zip() and it accidentally aligns index [0] of list_of_n in arg1 with some other index of list_of_n in arg2?

(in case it's not clear, I'm converting the list to a set for purposes of speed in arg2, and under the pretense that zip will behave better if they're the same in arg1)

Connor
  • 4,216
  • 2
  • 29
  • 40
  • 6
    Discard all this and use [`collections.Counter`](https://docs.python.org/3/library/collections.html#collections.Counter), which is much more efficient and doesn't have this issue. – user2357112 Jan 17 '18 at 18:41
  • You're creating two different `set` objects. There is no guarantee that they will be iterated over in the same order. If you define a single `set` beforehand, then there is such a guarantee. You could do `[(n, list_of_n.count(n)) for n in set(list_of_n)]` instead, or use a `Counter` – Patrick Haugh Jan 17 '18 at 18:43
  • Do not use `[list_of_n.count(n) for n in set(list_of_n)]` that is a quadratic algorithm. There are linear algorithms for counting occurrences of items in a list, already implemented for you in `collections.Counter`. – juanpa.arrivillaga Jan 17 '18 at 18:45
  • See this: https://stackoverflow.com/questions/2161752/how-to-count-the-frequency-of-the-elements-in-a-list – keramat Jan 17 '18 at 18:51

1 Answers1

0

Since your sample output preserves the order of appearance, you might want to go with a collections.OrderedDict to gather the counts:

list_of_n = [2, 3, 5, 5, 5, 6, 2]
d = OrderedDict()
for x in list_of_n:
    d[x] = d.get(x, 0) + 1
occurance_of_n = list(d.items())
# [(2, 2), (3, 1), (5, 3), (6, 1)]

If order does not matter, the appropriate approach is using a collections.Counter:

occurance_of_n = list(Counter(list_of_n).items())

Note that both approach require only one iteration of the list. Your version could be amended to sth like:

occurance_of_n = list(set((n, list_of_n.count(n)) for n in set(list_of_n)))
# [(6, 1), (3, 1), (5, 3), (2, 2)]

but the repeated calls to list.count make an entire iteration of the initial list for each (unique) element.

user2390182
  • 72,016
  • 6
  • 67
  • 89