4

I'm looking for a Python magic method to pack a list of indexes of that sort

[0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 4, 4, 4]

into this, with each index grouped in a specific list :

[[0, 1, 2, 3, 4], [5, 6, 7], [8, 9], [10], [11, 12, 13]]

I have already done it with a list comprehension plus an append loop like the following, but I feel like there's a Python one-liner that could do that. I'm working on lists that sometimes reach 10000+ items, so performance is important.

li = [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 4, 4, 4]

result = [[] for _ in xrange(max(li)+1)]

for i in xrange(len(li)):
    result[li[i]].append(i)
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Kotch
  • 334
  • 1
  • 10
  • 3
    What are you doing with the resulting list? Maybe there is an overall simpler and/or faster solution, possibly using `numpy`. – mkrieger1 Jun 23 '15 at 11:41
  • The base principle was to be able to select random indexes from the first list, but by selecting the whole "base index" group. E.g. 60% could roughly take 1s, 2s and 4s, finally returning [5,6,7,8,9,11,12,13]. It later became a library function, so I guess usages will vary, that's why I wanted to convert it beforehand. I can't use numpy in my current environment but I'll take a look at it out of curiosity. – Kotch Jun 24 '15 at 08:28
  • What should be the result if the input list is `[1, 1, 1, 0, 0, 2, 2, 3, 5, 5, 5, 4, 3]`? – mkrieger1 Jun 24 '15 at 09:25
  • And what should be the result if the input list is `[5, 100]`? – mkrieger1 Jun 24 '15 at 09:26
  • The input list comes from a pre-defined function that always returns a ascending and consecutive list, so no biggie with that (at least for me). And if it doesn't for someone else, well just sort it beforehand :) – Kotch Jun 24 '15 at 20:40

4 Answers4

3

You can use itertools.groupby to group the values. Then calculate the indices based on the lengths of each group, and keep a running count of the starting index for that group.

from itertools import groupby
def index_list(l):
    temp = 0
    index_list = []
    for key, group in groupby(l):
        items = len(list(group))
        index_list.append([i+temp for i in range(items)])
        temp += items
    return index_list

Example

>>> l = [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 4, 4, 4]
>>> index_list(l)
[[0, 1, 2, 3, 4], [5, 6, 7], [8, 9], [10], [11, 12, 13]]
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
2

Not sure if this is better than the other answers, but I found it interesting to work it out nonetheless:

li = [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 4, 4, 4]

from collections import Counter

result = []
last = 0

for k,v in sorted(Counter(li).items()):
    result.append(list(range(last, last + v)))
    last += v
Rick
  • 43,029
  • 15
  • 76
  • 119
  • This would be simpler, shorter, and more readable if you just initialized `last` and `result` before the loop and dispensed with the `try/except` within it. Also, `_` is usually used to name a variable that's not referenced anywhere else — clearly not the case here. – martineau Jun 23 '15 at 15:18
  • 1
    You could also use `result.append(range(last, last+v))`. – martineau Jun 23 '15 at 15:25
  • @martineau I think you're right. This was actually the impetus behind [a question](http://stackoverflow.com/questions/31004590/using-except-nameerror-for-initialization-of-variables) I asked earlier today after I posted this answer. I'll edit. – Rick Jun 23 '15 at 15:47
  • Another point against using `except NameError` this way — in this case, at least — is that, while not too expensive, there is some additional overhead involved with having `try/except` handling inside the loop. It might matter if there's 10000+ items... As for your use of `_` as a variable names, see [_What is the purpose of the single underscore “\_” variable in Python?_](http://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python) – martineau Jun 23 '15 at 19:32
  • @martineau Well, the block only hits `except` the very first time through. I suppose if you mean 10,000+ sets of data, you'd be right. I get what you're saying about `_`, but I tend to think that using it inside of generator expressions/list comprehension in the manner above is pretty harmless. It's just a placeholder, and it goes out of scope as soon as the expression is complete. Nobody cares what it's called. You've helped me understand that I'm bucking convention this way, though. – Rick Jun 23 '15 at 20:02
  • @martineau Made the edits as you suggested. I agree it's cleaner this way. – Rick Jun 23 '15 at 20:34
  • +1: I think you're answer's even better now. However I don't think you understood what I was saying about `_`. The point was you weren't just using it as a placeholder, since it's value _was_ being used via the `_ + last` part of the generator expression. That's a moot point now — however the `k` in the `for` loop is a good candidate for this treatment since there are no other references to it anywhere. – martineau Jun 24 '15 at 00:04
  • @martineau thanks. Yup you're right about the k. I do understand, was just saying that since the underscore is only being used for building up the result inside the expression, it doesn't need a name. It's similar though not quite the same as a variable that's only referenced once and then not used again, except it's even more "disposable" because it goes out of scope immediately. – Rick Jun 24 '15 at 02:16
2

This can be done with the following expression:

>>> li = [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 4, 4, 4]
>>> [[i for i, n in enumerate(li) if n == x] for x in sorted(set(li))]
[[0, 1, 2, 3, 4], [5, 6, 7], [8, 9], [10], [11, 12, 13]]
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
0

My implementation:

li = [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 4, 4, 4]
lout = []
lparz = []

prev = li[0]    
for pos, el in enumerate(li):
    if el == prev:
        lparz.append(pos)
    else:
        lout.append(lparz)
        lparz = [pos,]
    prev = el

lout.append(lparz)
print lout

outputs

[[0, 1, 2, 3, 4], [5, 6, 7], [8, 9], [10], [11, 12, 13]]

as required.

Pynchia
  • 10,996
  • 5
  • 34
  • 43