2

I have a dictionary called lemma_all_context_dict, and it has approximately 8000 keys. I need a list of all possible pairs of these keys.

I used:

pairs_of_words_list = list(itertools.combinations(lemma_all_context_dict.keys(), 2)) 

However, when using this line I get a MemoryError. I have 8GB of RAM but perhaps I get this error anyway because I've got a few very large dictionaries in this code.

So I tried a different way:

pairs_of_words_list = []
for p_one in range(len(lemma_all_context_dict.keys())):
        for p_two in range(p_one+1,len(lemma_all_context_dict.keys())):
                pairs_of_words_list.append([lemma_all_context_dict.keys()[p_one],lemma_all_context_dict.keys()[p_two]])

But this piece of codes takes around 20 minutes to run... does anyone know of a more efficient way to solve the problem? Thanks

**I don't think that this question is a duplicate because what I'm asking - and I don't think this has been asked - is how to implement this stuff without my computer crashing :-P

Cheshie
  • 2,777
  • 6
  • 32
  • 51
  • Why are you preparing the list? What are you going to do with that? – thefourtheye Jan 15 '15 at 13:26
  • I need to loop over the list with a `for` loop and do some elaborate and annoying stuff with the pairs that are inside... – Cheshie Jan 15 '15 at 13:30
  • 4
    Use `itertools.product` and just iterate over the pairs, don't create the entire list. – thefourtheye Jan 15 '15 at 13:30
  • Thanks @thefourtheye; do `product` and `combinations` do the same thing? I think that `product` gives me a Cartesian product, meaning I'd have (a,b) but not (b,a)... am I wrong? – Cheshie Jan 15 '15 at 13:36
  • Yes, `product` is actually Cartesian product. If you want `b, a` also, you can use `combinations` – thefourtheye Jan 15 '15 at 13:37
  • Isn't that what I did in the first place...? Ah, I think you mean I should try what Pierre proposed.. OK, thanks. – Cheshie Jan 15 '15 at 13:39
  • Exactly, when you use `list`, all the possible combinations are stored in the list which is unnecessary for your use case. – thefourtheye Jan 15 '15 at 13:40
  • Just to be clear, `product(keys,repeat=2)` will give `(a,a), (a,b), (b,a), (b,b)`. `combinations(keys,2)` will give only `(a,b)`. `permutations(keys,2)` will give `(a,b), (b,a)`. – DSM Jan 15 '15 at 13:40

1 Answers1

2

Don't build a list, since that's the reason you get a memory error (you even create two lists, since that's what .keys() does). You can iterate over the iterator (that's their purpose):

for a, b in itertools.combinations(lemma_all_context_dict, 2):
    print a, b
Pierre
  • 6,047
  • 1
  • 30
  • 49
  • 1
    Thanks... I tried it - and it worked, so I'll accept it. But now I got the `MemoryError` a few rows later... I hope what you proposed really doesn't eat up too much of my memory. But thanks again :) – Cheshie Jan 15 '15 at 14:04