Zipping a set and a list in python

Question

I'm trying to find duplicates in a list. I want to preserve the values and insert them into a tuple with their number of occurrences.

For example:

list_of_n = [2, 3, 5, 5, 5, 6, 2]
occurance_of_n = zip(set(list_of_n), [list_of_n.count(n) for n in set(list_of_n)])

[(2, 2), (3, 1), (5, 3), (6, 1)]

This works fine with small sets. My question is: as list_of_n gets larger, will I have to worry about arg1 and arg2 in zip(arg1, arg2) not lining up correctly if they're the same set?

I.e. Is there a conceivable future where I call zip() and it accidentally aligns index [0] of list_of_n in arg1 with some other index of list_of_n in arg2?

(in case it's not clear, I'm converting the list to a set for purposes of speed in arg2, and under the pretense that zip will behave better if they're the same in arg1)

Discard all this and use [`collections.Counter`](https://docs.python.org/3/library/collections.html#collections.Counter), which is much more efficient and doesn't have this issue. — user2357112, Jan 17 '18 at 18:41
You're creating two different `set` objects. There is no guarantee that they will be iterated over in the same order. If you define a single `set` beforehand, then there is such a guarantee. You could do `[(n, list_of_n.count(n)) for n in set(list_of_n)]` instead, or use a `Counter` — Patrick Haugh, Jan 17 '18 at 18:43
Do not use `[list_of_n.count(n) for n in set(list_of_n)]` that is a quadratic algorithm. There are linear algorithms for counting occurrences of items in a list, already implemented for you in `collections.Counter`. — juanpa.arrivillaga, Jan 17 '18 at 18:45
See this: https://stackoverflow.com/questions/2161752/how-to-count-the-frequency-of-the-elements-in-a-list — keramat, Jan 17 '18 at 18:51

user2390182 · Accepted Answer · 2018-01-17T19:08:17.000

Since your sample output preserves the order of appearance, you might want to go with a collections.OrderedDict to gather the counts:

list_of_n = [2, 3, 5, 5, 5, 6, 2]
d = OrderedDict()
for x in list_of_n:
    d[x] = d.get(x, 0) + 1
occurance_of_n = list(d.items())
# [(2, 2), (3, 1), (5, 3), (6, 1)]

If order does not matter, the appropriate approach is using a collections.Counter:

occurance_of_n = list(Counter(list_of_n).items())

Note that both approach require only one iteration of the list. Your version could be amended to sth like:

occurance_of_n = list(set((n, list_of_n.count(n)) for n in set(list_of_n)))
# [(6, 1), (3, 1), (5, 3), (2, 2)]

but the repeated calls to list.count make an entire iteration of the initial list for each (unique) element.

this is in 2.7, does that matter? – Connor Jan 17 '18 at 22:00 — Connor, Jan 17 '18 at 22:00
The answer to my question was "no" – Connor Jun 11 '18 at 02:50 — Connor, Jun 11 '18 at 02:50

Zipping a set and a list in python

1 Answers1