-2

How can I convert the following code into a list comprehension?

for i in range(xy-1):
    for b in range(i+1, xy):
        if(fuzz.token_set_ratio(Names[i], Names[b]) >= 90):
            FuzzNames[b].append(ID[i])
        else:
            pass

Thanks for helping.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 6
    I always refer to this [answer](https://stackoverflow.com/questions/18072759/list-comprehension-on-a-nested-list/45079294#45079294) when converting back and forth. – quamrana Aug 09 '22 at 20:39
  • This is made a little more tricky because it doesn't return a single list. It's filling in some values, and leaving other values alone. – Tim Roberts Aug 09 '22 at 20:43
  • I don't think this can be a list comprehension. It's not creating new lists, it's appending to existing lists. And it's appending to a different list each time through the loop. I thought it could be done by swapping the order of the `for` loops, but `b` is dependent on `i`. – Barmar Aug 09 '22 at 20:46
  • I am doing a fuzzy search between 32,000 lines of data, the code took 7 hours but did not finish. I read about how to make my code faster, one of the solutions was list comprehension. This is the main block in my code, so I wondered how to do it – Abdulrahman Hocaoglu Aug 09 '22 at 20:53
  • The list processing is NOT your bottleneck. Have you timed the individual fuzzy searches? If a single search takes 1 second, then 32,000 searches will take 9 hours. – Tim Roberts Aug 09 '22 at 20:59
  • I did time the fuzzy search of 100, 200, 400 and 800 search to do some approximation. It will take me more than 9 hours. I did some changes to my code, but the last one was the list processing – Abdulrahman Hocaoglu Aug 09 '22 at 21:21

1 Answers1

0

You could do something like this:

indices = [(i,b) for i in range(xy-1) for b in range(i+1, xy) if fuzz.token_set_ratio(Names[i], Names[b]) >= 90]

for i, b in indices:
  FuzzNames[b].append(ID[i])

However, the way you originally wrote it is more readable and easier to understand, and it's not likely that you need that list of indices later on.

  • This is a little faster. The data I am working with have 11 columns, every one with ten of thousands of line. This will save hours for me, thank you. – Abdulrahman Hocaoglu Aug 09 '22 at 21:14