0

I just read the page over here: Get unique combinations of elements from a python list

The solution approved works...

... But it works only with "small" lists (100 elements, for example).

I have a "big" list of strings (1 million of elements) and i get the infamous "MemoryError" exception.

What is the best way to get unique combos on very large lists?

Thanks in advance

Sergio La Rosa
  • 495
  • 8
  • 18
  • 2
    You don't. This is a bad idea. – user2357112 Jul 25 '17 at 17:12
  • What is the size of the combinations? Very quickly you'll have more data than you could hope to process – roganjosh Jul 25 '17 at 17:13
  • The size of combinations is "2" – Sergio La Rosa Jul 25 '17 at 17:14
  • 1
    Depending on how many of your 1M elements are indeed unique, you might be able to simply `uniques = tuple(set(elements))` and get the combinations of the elements in `uniques`? – inspectorG4dget Jul 25 '17 at 17:16
  • Thanks, this worked. – Sergio La Rosa Jul 25 '17 at 17:21
  • 1,000,000 choose 2 is 499,999,500,000, or just under 500 trillion. Depending on what they are and your computer, you might be able to do it. – Sam Craig Jul 25 '17 at 17:22
  • How many unique strings are in your list of a million strings? If you have lots of duplicate strings, then it _may_ be practical to build a list of all the pairs, but even then it's probably better to create an iterator, and process the pairs as they're produced. If all million strings are unique, then I doubt you have the RAM to hold a trillion pairs of strings (or half a trillion if you don't want both (a, b) and (b,a)). – PM 2Ring Jul 25 '17 at 17:22
  • @samcraig how can there be more than 1mil x 1 mil? – roganjosh Jul 25 '17 at 17:28
  • @roganjosh 499,999,500,000 is less than 1,000,000 squared. – Sam Craig Jul 25 '17 at 17:58
  • @samcraig aha, I think we're at a language barrier, 1mil x 1mil is called a trillion in UK so 500 trillion is significantly more. I didn't count the digits. – roganjosh Jul 25 '17 at 18:02

1 Answers1

0

Per inspectorG4dget's comment and the linked answer, if a large number of values in your initial list are duplicated, filter them out through a set first, then find your combos.

from itertools import combinations

elements = [gigantic list]

uniques = tuple(set(elements))
combos = [','.join(str(thing) for thing in combo) for combo in combinations(uniques, 2)]
Eric Ed Lohmar
  • 1,832
  • 1
  • 17
  • 26