0

I have a list of lists, let's call it thelist, that looks like this:

[[Redditor(name='Kyle'), Redditor(name='complex_r'), Redditor(name='Lor')],
[Redditor(name='krispy'), Redditor(name='flyry'), Redditor(name='Ooer'), Redditor(name='Iewee')],
[Redditor(name='Athaa'), Redditor(name='The_Dark_'), Redditor(name='drpeterfost'), Redditor(name='owise23'), Redditor(name='invirtibrit')],
[Redditor(name='Dacak'), Redditor(name='synbio'), Redditor(name='Thosee')]]

thelist has 1000 elements (or lists). I'm trying to compare each one of these with the other lists pairwise and try to get the number of common elements for each pair of lists. the code doing that:

def calculate(list1,list2):
    a=0
    for i in list1:
        if (i in list2):
           a+=1
    return a

for i in range(len(thelist)-1):
   for j in range(i+1,len(thelist)):
      print calculate(thelist[i],thelist[j])

My problem is: the calculation of the function is extremely slow taking 2 or more seconds per a list pair depending on their length. I'm guessing, this has to do with my list structure. What am i missing here?

artre
  • 311
  • 4
  • 14
  • 2
    Convert this to use sets instead of lists. Then apply set intersection, and return the length of the result. Can you supply a self-contained example for us to experiment? – Prune Oct 23 '17 at 18:46
  • 1
    Are `Redditor` objects hashable? – Patrick Haugh Oct 23 '17 at 18:47
  • To follow up on @Prune's comment, I'd recommend checking the `Redditor` class and seeing if it can be hashed. – Rafael Barros Oct 23 '17 at 18:47
  • Is there a method behind the organization of your list of lists? It could determine whether there's a better way to look up the values. – ividito Oct 23 '17 at 18:48
  • @Prune thx. using list of set instead of list of list totally solved the problem – artre Oct 23 '17 at 20:43

1 Answers1

2

First I would recommend making your class hashable which is referenced here: What's a correct and good way to implement __hash__()?

You can then make your list of lists a list of sets by doing:

thelist = [set(l) for l in thelist]

Then your function will work much faster!

Spencer Bard
  • 1,015
  • 6
  • 10