2

I want to select entries matching a certain value out of a list of dicts in python 3. This should result in two lists: a new list with the selected entries and the modified original list without them.

Scenario

Assume we have a list of dicts:

import random, sys, time

letters_1 = []
colors = ["red", "orange", "yellow", "green", "blue", "purple"]
for i in range(100000):
    letter = {"color": random.choice(colors), "content": random.randint(0, sys.maxsize)} 
    letters_1.append(letter)
letters_2 = list(letters_1)

We want to select all dicts with a certain value for a certain key, collect them into a new list and leave only the others in the initial list. This corresponds to how one would select all red-colored letters out of an actual stack of letters.

Possibilities

This can be done via list comprehension or via a for loop.

The problem with the list comprehension is that every one list comprehension only creates one list. I.e. in order to do what we want to do, we must go through the list twice: first copy the selected items into a new list, then remove the selected items the original list. To continue the script:

time_0 = time.time()
red_letters_1 = [letter for letter in letters_1 if letter["color"]=="red"]
letters_1 = [letter for letter in letters_1 if letter["color"]!="red"]
time_1 = time.time()

The problem with the for loop is that it leads to more convoluted code and that it (surprisingly) takes longer to execute:

time_2 = time.time()
red_letters_2 = []
other_letters_2 = []
for letter in letters_2:
    if letter["color"] == "red":
        red_letters_2.append(letter)
    else:
        other_letters_2.append(letter)
letters_2 = other_letters_2
time_3 = time.time()

print(time_1 - time_0)
print(time_3 - time_2)

Output:

0.011380434036254883
0.015761613845825195

Note: You can remove the need for having a second list other_letters_2 by going through the list backwards and using pop(), but this takes even longer (more than 10 times longer, actually).

Question

While the possibility with two list comprehensions is clearly the fastest of these possibilities, it seems inefficient to do two list comprehensions. Is it possible to fold this into one list comprehension (without making it inefficient)? Is there another more efficient way? Or is there a reason why it is not possible to speed things up beyond the possibility with two list comprehensions?

Note on related questions

The question has been suggested to be a duplicate of this thread, where the question is about selecting two subsets of a list using list comprehension (or for loops). In this case, the only way is to test two different conditions, which may perhaps be shortened at the expense of some readability by applying a nested list comprehension as suggested in this answer.

Since (1) this solution to the suggested duplicate is not an option for the present question and (2) (as pointed out by Ev. Kounis ) the present question potentially allows for different solutions by modifying the original list in the list comprehension, I submit that this is not a duplicate (not an exact one in any case). I clarified this also in the beginning of the question.

Python version: 3.6.2

0range
  • 2,088
  • 1
  • 24
  • 32
  • *"The problem with the list comprehension is that every one list comprehension only creates one list"*. They do create only one, but in doing so they can modify another one as well by using `pop` for example.. – Ma0 Aug 24 '17 at 15:24
  • Are you 100% sure that this is going to called so many millions of times on such enormous lists, that a speed increase would be worth the tradeoff of less clear / maintainable code? Or are you just asking hypothetically? – Arthur Tacca Aug 24 '17 at 15:26
  • I do not think this question is a duplicate. OP here wants to modify the initial one and create a new one. Not two new ones. – Ma0 Aug 24 '17 at 15:28
  • @Ev.Kounis I cannot think of any way to use `pop` inside the list comprehension that would not interfere with the with the index and the way the list comprehension goes through the list. (In a for loop, you can go through the list backwards, but as mentioned in the original post, this takes even longer.) – 0range Aug 24 '17 at 16:20
  • @ArthurTacca I may need to execute this many times so that it may make a noticeable difference. It is probably not worth losing readability. But the double list comprehension features two lines that are almost exactly identical, which is also often not the best possible style. – 0range Aug 24 '17 at 16:23
  • @0range You are right about the list-comprehension (you have to modify the list as you loop through it and that is not a good idea). Creating a copy and using that instead will probably take too long even with `json` or `pickle` – Ma0 Aug 25 '17 at 07:43

1 Answers1

1

How about this: (i reduced the sample size to test but you can crank it back up)

import random, sys

letters_1 = []
colors = ["red", "orange", "purple"]
for i in range(10):
    letter = {"color": random.choice(colors), "content": random.randint(0, sys.maxsize)}
    letters_1.append(letter)

letters_1, letters_2 = [[x for x in letters_1 if x['color'] in i] for i in [('red', ), ("orange", "purple")]]

That is a single list-comprehension that takes advantage of variable unpacking.

Let me know how this performs in comparison. I am optimistic.


As you will also notice, the above code does not modify the original and create a new one but creates two new ones instead (overwrites the original)

Ma0
  • 15,057
  • 4
  • 35
  • 65
  • Thanks, this is essentially an adaptation of [the solution to the other thread](https://stackoverflow.com/a/14100580/1735215). Surprisingly, it is slower than the one with two list comprehensions: Two list comprehensions: `0.011686563491821289`, For loop: `0.016497135162353516`, Nested list comprehension (`red_letters_4, letters_4 = [[x for x in letters_4 if x['color'] in i] for i in [("red"), ("orange", "yellow", "green", "blue", "purple")]]`): `0.014462709426879883` – 0range Aug 27 '17 at 13:56