Separate list by entries

Question

I have a Python list that I know contains the entries 1, 2, and 7, e.g.,

data = [1, 7, 2, 1, 1, 1, 2, 2, 7, 1, 7, 7, 2]

I would now like to get all of the indices of each entry, i.e.,

g1 = [0, 3, 4, 5, 9]
g2 = [2, 6, 7, 12]
g7 = [1, 8, 10, 11]

The data array can be long, so efficiency matters. How do I achieve this?

`g = {target: [index for index, val in enumerate(data) if val == target] for target in set(data)}`? Then the indices of `1` would be `g[1]`. — jonrsharpe, Nov 09 '15 at 14:01
This way, I would need to iterate over the list several times, which takes too long in my application. — Nico Schlömer, Nov 09 '15 at 14:02
So use `g = collections.defaultdict(list)` and then `g[index].append(val)`? What have you actually *tried*, and what is the problem with it? — jonrsharpe, Nov 09 '15 at 14:03
I have no idea why this question has four close votes as "unclear what you are asking" ... did they even read the question? — Kijewski, Nov 09 '15 at 14:30
@Kay *"unclear what you mean by 'best'"*? *"Unclear why you haven't done anything yourself"*? — jonrsharpe, Nov 09 '15 at 14:34
It isn't clear to me why this question was closed. I have now removed the "best" in "best achieve this" since that didn't add clarity, but other than that, I'm unsure what to change. Describing my failed attempts doesn't contribute to the clarity of the question I believe. — Nico Schlömer, Nov 11 '15 at 01:33
Does [How to find all occurrences of an element in a list?](http://stackoverflow.com/questions/6294179/how-to-find-all-occurrences-of-an-element-in-a-list) answer your question? — Ilmari Karonen, Nov 20 '15 at 04:22

score 4 · Accepted Answer · answered Nov 09 '15 at 14:04

You could use a defaultdict in order to collect indices of elements per group:

In [1]: from collections import defaultdict

In [2]: data = [1, 7, 2, 1, 1, 1, 2, 2, 7, 1, 7, 7, 2]

In [3]: indices = defaultdict(list)

In [4]: for i, d in enumerate(data):
   ...:     indices[d].append(i)
   ...:     

In [5]: indices
Out[5]: defaultdict(<class 'list'>, {1: [0, 3, 4, 5, 9], 2: [2, 6, 7, 12], 7: [1, 8, 10, 11]})

score 1 · Answer 2 · answered Nov 09 '15 at 14:11

Though werkzeug is not really meant for this job, it will work well:

from werkzeug import MultiDict

data = [1, 7, 2, 1, 1, 1, 2, 2, 7, 1, 7, 7, 2]

g = MultiDict((v, i) for i, v in enumerate(data))
g1 = g.getlist(1)
g2 = g.getlist(2)
g7 = g.getlist(7)

print repr(g7)
# [1, 8, 10, 11]

Gab · Answer 3 · 2015-11-09T14:32:13.043

0

How about something more dynamic like this?

data = [1, 7, 2, 1, 1, 1, 2, 2, 7, 1, 7, 7, 2]
index_dict = {}

for i in range(len(data)):

  # Get or create the entry for the value
  sub_dict = index_dict.setdefault(val, [])

  # Add the index for the value
  sub_dict.append(i)

This code will create an entry for each value it encounter and store it's index. Then you can lookup the dictionary to know the index of every value.

While this code is less elegant than list comprehension, it has the advantage of iterating through the data only once.

edited Nov 09 '15 at 14:32

answered Nov 09 '15 at 14:10

Gab

5,604
6
36
52

1. You don't need `0` in the call to `range`, that's the default. 2. You should test for `None` by identity, with `is` (an empty list is also false-y, but can certainly be appended to). 3. You've just rewritten your own `defaultdict`, not even using e.g. `sub_dict = index_dict.get(val, [])`. – jonrsharpe Nov 09 '15 at 14:18
@jonrsharpe Using `sub_dict = index_dict.get(val, [])` would not create a new entry in index_dict. Testing with `is None` is not required because the entry will always have a value with this code but I updated it anyway. `range(len(data))`, you're totally right. – Gab Nov 09 '15 at 14:28
@Gab you're right, sorry - I meant `setdefault`, not `get`. – jonrsharpe Nov 09 '15 at 14:29
@jonrsharpe, I did not know about `setdefault`, my code is way better now, thanks :) – Gab Nov 09 '15 at 14:32

score -1 · Answer 4 · edited Nov 09 '15 at 14:08

-1

You could use itertools.compress

data = [1, 7, 2, 1, 1, 1, 2, 2, 7, 1, 7, 7, 2]

g1 = itertools.compress(range(len(data)),  map(lambda x: x==1, data))
g2 = itertools.compress(range(len(data)),  map(lambda x: x==2, data))
g7 = itertools.compress(range(len(data)),  map(lambda x: x==7, data))

edited Nov 09 '15 at 14:08

awesoon

32,469
11
74
99

answered Nov 09 '15 at 14:07

Mr. E

2,070
11
23

Like my first suggestion, this will iterate over `data` multiple times – jonrsharpe Nov 09 '15 at 14:08

Separate list by entries

4 Answers4