Python list comprehensions to create multiple lists

Question

I want to create two lists listOfA and listOfB to store indices of A and B from another list s.

s=['A','B','A','A','A','B','B']

Output should be two lists

listOfA=[0,2,3,4]
listOfB=[1,5,6]

I am able to do this with two statements.

listOfA=[idx for idx,x in enumerate(s) if x=='A']
listOfB=[idx for idx,x in enumerate(s) if x=='B']

However, I want to do it in only one iteration using list comprehensions only. Is it possible to do it in a single statement? something like listOfA,listOfB=[--code goes here--]

@kojiro: No complexity is not an issue here, I just want to explore features of python. — Heisenberg, Jan 09 '14 at 15:03
Since issue is closed, will add an answer here: `s = ['A','B','A','A','A','B','B']` `listOfA, listOfB = [], []` `[listOfA.append(c) if c == 'A' else listOfB.append(c) for c in s]` — gvm, Sep 21 '21 at 15:26

Martijn Pieters · Accepted Answer · 2014-01-09T15:09:23.220

57

The very definition of a list comprehension is to produce one list object. Your 2 list objects are of different lengths even; you'd have to use side-effects to achieve what you want.

Don't use list comprehensions here. Just use an ordinary loop:

listOfA, listOfB = [], []

for idx, x in enumerate(s):
    target = listOfA if x == 'A' else listOfB
    target.append(idx)

This leaves you with just one loop to execute; this will beat any two list comprehensions, at least not until the developers find a way to make list comprehensions build a list twice as fast as a loop with separate list.append() calls.

I'd pick this any day over a nested list comprehension just to be able to produce two lists on one line. As the Zen of Python states:

Readability counts.

edited Jan 09 '14 at 15:09

answered Jan 09 '14 at 14:55

Martijn Pieters

1,048,767
296
4,058
3,343

Is list comprehension (to generate a single list) faster than list generation by appending? – Heisenberg Jan 09 '14 at 15:01
2

@Heisenberg: yes, because Python can do the list building entirely in C then. No pesky Python stack pushes and pops, no `.append()` attribute lookups. We can optimize the latter a little (use `A, B = listOfA.append, listOfB.append` outside the loop and reuse those), but the stack call is still going to be slower than the C code. – Martijn Pieters Jan 09 '14 at 15:04

score 13 · Answer 2 · answered Jan 09 '14 at 14:55

13

Sort of; the key is to generate a 2-element list that you can then unpack:

listOfA, listOfB = [[idx for idx, x in enumerate(s) if x == c] for c in 'AB']

That said, I think it's pretty daft to do it that way, an explicit loop is much more readable.

answered Jan 09 '14 at 14:55

RemcoGerlich

30,470
6
61
79

15

This still loops twice, and is mightily unreadable. – Martijn Pieters Jan 09 '14 at 14:56

Abhijit · Answer 3 · 2014-01-09T15:13:57.520

6

A nice approach to this problem is to use defaultdict. As @Martin already said, list comprehension is not the right tool to produce two lists. Using defaultdict would enable you to create segregation using a single iteration. Moreover your code would not be limited in any form.

>>> from collections import defaultdict
>>> s=['A','B','A','A','A','B','B']
>>> listOf = defaultdict(list)
>>> for idx, elem in enumerate(s):
    listOf[elem].append(idx)
>>> listOf['A'], listOf['B']
([0, 2, 3, 4], [1, 5, 6])

edited Jan 09 '14 at 15:13

answered Jan 09 '14 at 15:00

Abhijit

62,056
18
131
204

2

For two keys, I'd put money on my conditional statement beating your `hash(elem)` calls. – Martijn Pieters Jan 09 '14 at 15:02
@MartijnPieters: I wont bet on this with you. I am just providing an alternative, provided OP wants to extend this idea over multiple keys(items). – Abhijit Jan 09 '14 at 15:03

abarnert · Answer 4 · 2018-07-20T20:27:30.680

What you're trying to do isn't exactly impossible, it's just complicated, and probably wasteful.

If you want to partition an iterable into two iterables, if the source is a list or other re-usable iterable, you're probably better off either doing it in two passes, as in your question.

Even if the source is an iterator, if the output you want is a pair of lists, not a pair of lazy iterators, either use Martijn's answer, or do two passes over list(iterator).)

But if you really need to lazily partition an arbitrary iterable into two iterables, there's no way to do that without some kind of intermediate storage.

Let's say you partition [1, 2, -1, 3, 4, -2] into positives and negatives. Now you try to next(negatives). That ought to give you -1, right? But it can't do that without consuming the 1 and the 2. Which means when you try to next(positives), you're going to get 3 instead of 1. So, the 1 and 2 need to get stored somewhere.

Most of the cleverness you need is wrapped up inside itertools.tee. If you just make positives and negatives into two teed copies of the same iterator, then filter them both, you're done.

In fact, this is one of the recipes in the itertools docs:

def partition(pred, iterable):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

(If you can't understand that, it's probably worth writing it out explicitly, with either two generator functions sharing an iterator and a tee via a closure, or two methods of a class sharing them via self. It should be a couple dozen lines of code that doesn't require anything tricky.)

And you can even get partition as an import from a third-party library like more_itertools.

Now, you can use this in a one-liner:

lst = [1, 2, -1, 3, 4, -2]
positives, negatives = partition(lst, lambda x: x>=0)

… and you've got an iterator over all the positive values, and an iterator over all of the negative values. They look like they're completely independent, but together they only do a single pass over lst—so it works even if you assign lst to a generator expression or a file or something instead of a list.

So, why isn't there some kind of shortcut syntax for this? Because it would be pretty misleading.

A comprehension takes no extra storage. That's the reason generator expressions are so great—they can transform a lazy iterator into another lazy iterator without storing anything.

But this takes O(N) storage. Imagine all of the numbers are positive, but you try to iterate negative first. What happens? All of the numbers get pushed to trueq. In fact, that O(N) could even be infinite (e.g., try it on itertools.count()).

That's fine for something like itertools.tee, a function stuck in a module that most novices don't even know about, and which has nice docs that can explain what it does and make the costs clear. But doing it with syntactic sugar that made it look just like a normal comprehension would be a different story.

score 2 · Answer 5 · answered Oct 23 '18 at 10:42

2

For those who live on the edge ;)

listOfA, listOfB = [[i for i in cur_list if i is not None] for cur_list in zip(*[(idx,None) if value == 'A' else (None,idx) for idx,value in enumerate(s)])]

answered Oct 23 '18 at 10:42

Daniel Braun

2,452
27
25

Python list comprehensions to create multiple lists

5 Answers5

Linked