I have a workflow where I process lists, and keep track of which ones I've seen in a dictionary - because when I get to one that's been seen already, I can pass on it. The caveat is that two lists with the same elements ordered differently are considered duplicates. So to take a simple example of integer lists: [3, 1, 5, 6]
and [5, 3, 6, 1]
are equivalent.
How I've been doing this is, take tuple(sorted(L))
and put that in my dictionary. So for the above example seen
looks like:
seen = {..., (1, 3, 5, 6): 1, ...}
This has the nice advantage of being constant time lookup for each list I potentially need to process. The problem however is that in order to check if a given list is in seen
, I have to do tuple(sorted(L))
on it. And it turns out that for large amounts of data, this becomes prohibitive to the point of taking over 50% of the total time of my entire process.
I'd like to make use of collections.Counter
somehow - because Counter[3, 1, 5, 6]
and Counter[5, 3, 6, 1]
would evaluate equal. But Counter objects can't be used as dict keys. How can I keep my dictionary lookup, but without doing the sorting operation indicated above? Thanks in advance.
EDIT: In looking at the suggestions to use frozenset
I realized I left an important thing out, which is that elements can duplicate. So while the two lists above compare equal, [3, 1, 5, 6]
and [3, 3, 1, 5, 6, 6]
need to be seen as distinct. Since the frozenset
operation would remove duplicates, it isn't an option here.