0

I'm currently working on moving some excel worksheets over to python automation, and have come across a speed issue.

I have a list of lists containing around 10.000 lists each with 20 or so columns. I also have a list of account numbers (100.000 numbers)

I wish to iterate over my list of lists and then pick out values from the list, if the account number in the lists matches one in the account list.

By running this code, i am able to get my desired result, however it is painfully slow.

calc = 0
for row in listOfLists:
    if row[1] in Accounts:
        calc += row[8]

Any ideas on how to optimize for speed?

Henrik Poulsen
  • 935
  • 2
  • 13
  • 32

2 Answers2

7

Make Accounts a set. The contains check is O(1) for sets, but O(N) for lists. Your entire process' time complexity will be reduced from O(M*N) to O(M+N). You can further use sum with a conditional generator expression:

account_set = set(Accounts)
calc = sum(row[8] for row in listOfLists if row[1] in account_set)
user2390182
  • 72,016
  • 6
  • 67
  • 89
2

A functional approach of this, using the set over Accounts as @schwobaseggle answered:

account_set = set(Accounts)
calc = reduce(lambda row: row[8] if row[1] in account_set else 0, listOfLists , 0)
Netwave
  • 40,134
  • 6
  • 50
  • 93