Insert missing elements in a nested list based on a condition - Python

Question

I have the nested list,

a = [(2,0),(3,0),(4,2),(10,3),(11,5)]

What I want to do is to add the inner-tuple (0,n) at position n, where n is the location of the missing element in a. The second element in each inner list should increase in increments of one, and if there is a gap then (0,n) should be inserted at that gap.

So the expected outcome for the list a is:

a_out = [(2,0),(3,0),(0,1),(4,2),(10,3),(0,4),(11,5)]

I.e since the first and second element in a is (3,0) and (4,2) the so a (0,1) is inserted between them.

My solution works, but I was wondering if there is a more pythonic way of going about it? I have been looking up Python's itertools library but I can't find a concise solution.

My code so far is:

l1 = [n[1] for n in a]
l2 = range(max(l1)+1)
l3 = [n for n in l2 if not in l1]


zeros = [0]*len(l3)
inserts = zip(zeros,l3)
a_full = a + inserts

a_out = sorted(a_full, key = itemgetter(1))

Can anyone suggest a better solution to this??

EDIT:

In general there may be many elements with the same second inner element (for example the (2,0) and (3,0) occuring in a). However, I can group and sum these together without lose of generality.

The nested list a can then be represented as,

a_sum = [(5,0),(4,2),(10,3),(11,5)]

By using the code,

a_group = [sum([x for x, y in group]) for key, group in groupby(a, key=itemgetter(1))]

a_sum = zip(output,list(set(l1)))

EDIT II:

The length of a is always 600, but depending on how the research goes this may increase to of order 10**3.

You sure you want that `(2, 0)` at the start of your list? There's no way to fix `(2, 0)` and `(3, 0)` appearing together by inserting items. — user2357112, Dec 11 '13 at 10:14
I've just added an edit to address your comment. I can fix this by grouping elements with same second inner element and then sum these together; the separate elements `(2,0)` and `(3,0)` can be combined into `(5,0)` wlog. — Holtz, Dec 11 '13 at 10:38

John La Rooy · Accepted Answer · 2013-12-11T10:53:53.160

You can do this in a nested list comprehension in O(n). Just add any missing entries in the nested part.

>>> a = [(2,0),(3,0),(4,2),(10,3),(11,5)]
>>> [k for i,j in enumerate(a, 1) for k in [j] + [(0,n) for n in range(j[1]+1, a[min(i, len(a)-1)][1])]]
[(2, 0), (3, 0), (0, 1), (4, 2), (10, 3), (0, 4), (11, 5)]

or

>>> [k for i,j in zip(a, a[1:]) for k in [i] + [(0,n) for n in range(i[1]+1, j[1])]] + a[-1:]
[(2, 0), (3, 0), (0, 1), (4, 2), (10, 3), (0, 4), (11, 5)]

If a is large, you can avoid the a[1:] slice by using an extra iterator on it

>>> a_iter = iter(a); next(a_iter)
(2, 0)
>>> [k for i,j in zip(a, a_iter) for k in [i] + [(0,n) for n in range(i[1]+1, j[1])]] + a[-1:]
[(2, 0), (3, 0), (0, 1), (4, 2), (10, 3), (0, 4), (11, 5)]

score 1 · Answer 2 · answered Dec 11 '13 at 11:01

1

This version combines the (2,0) and (3,0) into (5,0) as permitted in the comments

>>> from collections import defaultdict
>>> D = defaultdict(int)
>>> a = [(2,0),(3,0),(4,2),(10,3),(11,5)]
>>> for i,j in a:
...     D[j]+=i
...
>>> [(D[n], n) for n in range(a[0][1], a[-1][1]+1)]
[(5, 0), (0, 1), (4, 2), (10, 3), (0, 4), (11, 5)]

answered Dec 11 '13 at 11:01

John La Rooy

295,403
53
369
502

Which solution would you suggest I use? Your previous solution and my grouping method (in the edit) - or this solution on it's own? – Holtz Dec 11 '13 at 11:10
I'd expect the nested list comprehension is faster, but it is harder to read. Use which ever one suits you best. – John La Rooy Dec 11 '13 at 11:12
Many thanks @gnibbler. I think I'll go for the nested list comprehension since it is possible that this process may be optional, so keeping it separate may be easier to code. – Holtz Dec 11 '13 at 12:37

score 0 · Answer 3 · answered Dec 11 '13 at 10:30

0

a = [(2,0),(3,0),(4,2),(10,3),(11,5)]
i = 1
while i < len(a):
  if a[i-1][1] + 1 < a[i][1]:
    a.insert(i, (0, a[i-1][1]+1))
  i += 1

But you might want to consider using a different data type in general, maybe a defaultdict which seems to have a default value (0 in your case) at all places where nothing really is stored.

answered Dec 11 '13 at 10:30

Alfe

56,346
20
107
159

Agreed. Alas, the most Pythonic version isn't always the one with the best performance. – Alfe Dec 11 '13 at 11:19

unutbu · Answer 4 · 2013-12-11T11:01:09.603

0

import operator
a = [(2,0),(3,0),(4,2),(10,3),(11,5)]
seen = set([item[1] for item in a])
inserts = [(0, item) for item in range(max(seen)) if item not in seen]
a_out = sorted(a + inserts, key=operator.itemgetter(1))
print(a_out)

yields

[(2, 0), (3, 0), (0, 1), (4, 2), (10, 3), (0, 4), (11, 5)]

The above O(n log n) solution preserves the behavior of the code you posted. If we can also assume that the second item in the tuples of a are always non-decreasing, then there are better O(n) (one-pass) solutions such as:

a = [(2,0),(3,0),(4,2),(10,3),(11,5)]
result = a[:1]
for item in a[1:]:
    result.extend(
        [(0,i) for i in range(result[-1][1]+1, item[1])] + [item])

edited Dec 11 '13 at 11:01

answered Dec 11 '13 at 10:31

unutbu

842,883
184
1,785
1,677

Yes, I can assume that the second item in the tuples of `a` are always non-decreasing. The are all increasing but, a priori, I do not know the behavior in that increasing pattern. Given this - what are the better `O(n)` solutions? – Holtz Dec 11 '13 at 10:48
@Holtz, anything using `sort` isn't O(n) – John La Rooy Dec 11 '13 at 10:53
@gnibbler, I don't need to use `sort` - the data that comes in is already sorted. – Holtz Dec 11 '13 at 11:09
@Holtz, I meant that any answers using `sort` are not O(n). Not that it's necessary to use sort – John La Rooy Dec 11 '13 at 11:10

score 0 · Answer 5 · answered Dec 11 '13 at 10:35

0

Why not just use a small and readable function:

def fill(seq):
    """
    >>> list(fill([(2, 0), (3, 0), (4, 2), (10, 3), (11, 5)]))
    [(2, 0), (3, 0), (0, 1), (4, 2), (10, 3), (0, 4), (11, 5)]
    """
    prev = None
    for value, key in seq:
        if prev != None:
            while key > prev + 1:
                prev += 1
                yield (0, prev)
        yield (value, key)
        prev = key

if __name__ == '__main__':
    import doctest
    doctest.testmod()

answered Dec 11 '13 at 10:35

sloth

99,095
21
171
219

thanks for your response. I've having some difficulty in interpreting your code though, could it be possible if you could provide a brief explanation? Many thanks. – Holtz Dec 11 '13 at 20:41
Step through each value/key-pair in the sequence. Set `prev` to `key` on every iteration. If the difference between the current `key` and the previous `key` is larger than `1`, fill in the missing values with `(0, "missing_key")`. – sloth Dec 12 '13 at 07:43

score 0 · Answer 6 · answered Dec 11 '13 at 11:05

>>> from collections import defaultdict
>>> D = defaultdict(list)
>>> a = [(2,0),(3,0),(4,2),(10,3),(11,5)]
>>> for i,j in a:
...     D[j].append(i)
...
>>> [(z, n) for n in range(a[0][1], a[-1][1]+1) for z in D[n] or [0]]
[(2, 0), (3, 0), (0, 1), (4, 2), (10, 3), (0, 4), (11, 5)]

Insert missing elements in a nested list based on a condition - Python

6 Answers6