2

I currently have a list of lists (let's name it "big") that is about 9 columns and 5000 rows and growing. I have another list (let's name this one "small") that has approximately 3000 elements. My goal is to return each row in big where big[8] can be found in small. The results will be stored in a list of lists.

I have used list comprehension which has been returning the proper output, but it is far too inefficient for my needs. It takes several seconds to process these 5000 rows (usually about 6.5 seconds, and its efficiency gets worse with larger lists), and it needs to be able to quickly handle hundreds of thousands of rows.

The list comprehension I wrote is:

results = [row for row in big if row[8] in small]

Sample data of list of lists (big):

[[23.4, 6.8, 9.0, 13.0, 4.0, 19.0, 2.5, 7.6, 1472709600000], 
[32.1, 15.5, 17.7, 21.7, 12.7, 27.7, 11.2, 16.3, 1472882400000], 
[40.8, 24.2, 26.4, 30.4, 21.4, 36.4, 19.9, 25.0, 1473055200000], 
[49.5, 32.9, 35.1, 39.1, 30.1, 45.1, 28.6, 33.7, 1473228000000], 
[58.2, 41.6, 43.8, 47.8, 38.8, 53.8, 37.3, 42.4, 1473400800000]]

Sample data of list (small):

[1472709600000, 1473055200000]

Desired output (results):

[[23.4, 6.8, 9.0, 13.0, 4.0, 19.0, 2.5, 7.6, 1472709600000], 
[40.8, 24.2, 26.4, 30.4, 21.4, 36.4, 19.9, 25.0, 1473055200000]]

Is there a more efficient way to return each row that has its last element found in another list?

3 Answers3

3

You can easily eliminate the linear search of small on each iteration by using a set:

smallset = set(small)
results = [row for row in big if row[8] in smallset]
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • Oh of course! I was trying to figure out how to either make both big and small a set and compare them, or just make big a set. Totally glossed over the efficiency of making small a set. Thank you so much for you help! Making this change made my efficiency increase 180x on the 5000 rows. – Hunter Corry Sep 28 '16 at 04:22
  • 180x is impressive. Could you post your results? – sisanared Sep 28 '16 at 04:29
  • Why is it faster with a set ? I didn't understood. – Jal Sep 28 '16 at 05:01
  • @Jal the chosen answer at this post explains it very well: [link](http://stackoverflow.com/questions/8929284/what-makes-sets-faster-than-lists-in-python) – Hunter Corry Sep 28 '16 at 05:13
  • @SSNR the run time decreased from about 6.5 seconds to about 0.035 seconds for the 5000 rows, and takes only 0.049 seconds for 10,000 rows – Hunter Corry Sep 28 '16 at 05:38
1

A quick and easy way to do is to create a dictionary with 8 column item as the key. Below is a code snippet.

big_dict = {}
for list in big:
    big_dict[list[-1]] = list
output_list = []
for element in small:
    output_list.append(big_dict[element])
  • I really like this approach. Using this, is there a way to also return a list of lists that contains those rows whose last element was not found in small? – Hunter Corry Sep 28 '16 at 04:43
  • 1
    @Hunter Corry Yes, you can do that. `big_dict = {} for list in big: big_dict[list[-1]] = list output_list_1 = [] output_list_2 = [] for element_x in small: output_list_1.append(big_dict.pop(element_x)) for element in big_dict.values(): output_list_2.append(element)` – Gopi Vinod Avvari Sep 28 '16 at 04:54
  • Thank you! This method is fractionally more efficient, but quite a bit harder to read. For this reason, I'm going to use what @John Zwinck recommended so others will be able to follow along more easily. If I run into more efficiency issues as the number of rows increases, I will definitely be switching to this method to squeeze the last bit of performance out of the code. – Hunter Corry Sep 28 '16 at 05:07
  • What if two lists in big have the same last element ? Doesn't work – Jal Sep 28 '16 at 05:08
  • That's a good point @Jal, but luckily in my use, the last element is actually a primary key for a SQL table, so the last element will always be unique. – Hunter Corry Sep 28 '16 at 05:11
  • I just gave an idea. One can check if there is an existing key and make the existing 1D list into 2D and same when checking with elements in small. I know it might be little too much but it has its own advantages for some cases. – Gopi Vinod Avvari Sep 28 '16 at 05:12
-1

Binary search your small list which will reduce the time complexity to O(log n).

smallset = sort(small)
def binarySearch(x):
   #implement binary search
   pass
results = [row for row in big if binarySearch(row[8])]
rsirs
  • 107
  • 7
  • Although this code may help to solve the problem, it doesn't explain _why_ and/or _how_ it answers the question. Providing this additional context would significantly improve its long-term educational value. Please [edit] your answer to add explanation, including what limitations and assumptions apply. – Toby Speight Sep 28 '16 at 08:38