Seperate lists based on indices

Question

I have 2 lists:

data = [0, 1, 2, 3, 7, 8, 9, 10]
indices = [1, 1, 0, 0, 0, 2, 1, 0]

I want to append the data to a 2-D array given the indices which correspond to the 2-D array. meaning:

new_list = [[]]*len(set(indices))

Where new_list will results as follows:

new_list = [[2,3,7,10],[0,1,9],[8]]

I am using this code:

for i in range(len(set(indices)):
    for j in range(len(indices)):
        if indices[j] == i:
            new_list[i].append(data[j])
        else:
            pass

However, I get this:

new_list = [[2, 3, 7, 10, 0, 1, 9, 8], [2, 3, 7, 10, 0, 1, 9, 8], [2, 3, 7, 10, 0, 1, 9, 8]]

I am not sure what mistake I am doing, any help is appreciated!

score 1 · Answer 1 · answered Oct 25 '18 at 02:40

You can use a dict to map the values to their respective indices, and then use a range to output them in order, so that this will only cost O(n) in time complexity:

d = {}
for i, n in zip(indices, data):
    d.setdefault(i, []).append(n)
newlist = [d[i] for i in range(len(d))]

newlist becomes:

[[2, 3, 7, 10], [0, 1, 9], [8]]

ShadowRanger · Accepted Answer · 2018-10-25T13:23:14.860

You're iterating your indices completely for every value, which is wasteful. You're also multiplying a list of lists, which doesn't do what you expect (it makes a list of many references to the same underlying list). You want to pair up indices and values instead (so you do O(n) work, not O(n**2)), which is what zip was made for, and make your list of empty lists safely (a list of several independent lists):

data = [0, 1, 2, 3, 7, 8, 9, 10]
indices = [1, 1, 0, 0, 0, 2, 1, 0]

# Use max because you really care about the biggest index, not the number of unique indices
# A list comprehension producing [] each time produces a *new* list each time
new_list = [[] for _ in range(max(indices)+1)]

# Iterate each datum and matching index in parallel exactly once
for datum, idx in zip(data, indices):
    new_list[idx].append(datum)

score 0 · Answer 3 · answered Oct 25 '18 at 01:35

0

To get at this, i zipped the data with its index:

>>>data = [0, 1, 2, 3, 7, 8, 9, 10]
>>>indices = [1, 1, 0, 0, 0, 2, 1, 0]

>>>buff = sorted(list(zip(indices,data)))
>>>print(buff)
[(0, 2), (0, 3), (0, 7), (0, 10), (1, 0), (1, 1), (1, 9), (2, 8)]

Then I used the set of unique indices as a way to determine if the data gets included in a new list. This is done with nested list comprehensions.

>>>new_list = list(list((b[1] for b in buff if b[0]==x)) for x in set(indices))    
>>>print(new_list)
[[2, 3, 7, 10], [0, 1, 9], [8]]

I hope this helps.

answered Oct 25 '18 at 01:35

rAntonioH

129
2
12

`sorted` implies construction of a new `list` already, explicitly constructing a `list` first means needlessly creating a temporary only to throw it away immediately. Just use `sorted(zip(indices,data))`. Your nested listcomp is also needlessly wasteful, performing `O(n*m)` (`m` being the number of unique indices) work when your presort means only `O(n)` follow-up work is actually needed, e.g. with `itertools.groupby`. Lastly, if `data` isn't necessarily in sorted order already, this loses `data`'s original ordering thanks to the sorting step. This isn't necessarily wrong, but it's suboptimal. – ShadowRanger Oct 25 '18 at 13:19

Seperate lists based on indices

3 Answers3