I have a list of 250,000 names. If I use itertools combinations to get a list of unique pairs: (my list is named 'content')
for pair in itertools.combinations(content, 2):
print pair
this will be the length of 250,000 choose 2, which is a really high number. For each pair I want to compute a text similarity measure which I've already written. A few questions:
a) should I use a yield statement along with the itertools method to create generator, since I can't fit that many pairs in memory? At some point though to use my text similarity function, I'll need to save the output into memory or write to a file, no?
b) does this generalize to a more common problem that I'm just not aware of? Maybe a sparse matrix?
c) is there some pre-processing step that could reduce the size of the list?
Thanks!
Jeff