0

I have a python list e with ~11,000 elements. I then have a list of indexes p of ~3,000 elements.

I want to filter e to keep only the elements at the indexes specified in p.

So far, I'm using simple list comprehension:

f = [x for i,x in enumerate(e) if i in p]

However, this implementation takes ~1s.

This might not be much, but as I have to do it for 10,000 lists, it becomes over 2 hours. I then have to repeat this again for 200 batches of 10,000 lists, so it's really too slow.

Any idea of how I can achieve the same result in a quicker manner?

MrD
  • 4,986
  • 11
  • 48
  • 90

1 Answers1

5

Turn p into a set. The i in p containment test against a list takes O(length_of_list) linear time, whilst testing against a set takes O(1) constant time:

p_set = set(p)
f = [x for i, x in enumerate(e) if i in p_set]

This makes filtering a O(length_of_e) operation, so 11k steps. With p a list, you made up to O(length_of_e * length_of_p) steps, so 33 million.

However, if p is a sorted list, you already have your indices in the correct order, and you can just loop over p to select the elements:

f = [e[i] for i in p]

Now you are taking only 3k steps.

If p is not sorted, the second version will produce the items in a different order from what they were listed in in e. That could be fine, or you could sort p first. However, sorting takes O(N log N) steps; with 3k items in p that’d take 3k times log(3k) == 3k times 8 == 24k steps, so not worth your time here over the first approach which is more than twice as efficient here.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I understand the first part of your answer. In the second part, can I seek clarification as to why the indicies must be in the correct order in order for you to loop through them in this way? – tim-mccurrach Jan 22 '18 at 17:33
  • 2
    @Tim it depends: if `p` is not sorted the second part produces different output as you then produce the values in the order `p` lists them in. If that’s fine, then just stick with the second version regardless of `p` being sorted. – Martijn Pieters Jan 22 '18 at 17:35