2

I have a list of values that I want to add its elements to the end of each list in a list of lists. Is there a Pythonic or efficient way to solve this?

For example, given:

x = [['a','b','c'],['d','e','f'],['g','h','i']]
y = [1,2,3]

I would expect: [['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]

I've tried:

list(zip(x,y))

But, this produces:

[(['a', 'b', 'c'], 1), (['d', 'e', 'f'], 2), (['g', 'h', 'i'], 3)]

I can solve it with an inefficient loop like this:

new_data = []
for i,x in enumerate(x):
    x.append(y[i])
    new_data.append(x)

print(new_data)
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
MattR
  • 4,887
  • 9
  • 40
  • 67

4 Answers4

7

To build a new list, I might do:

>>> x = [['a','b','c'],['d','e','f'],['g','h','i']]
>>> y = [1,2,3]
>>> [a + [b] for a, b in zip(x, y)]
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]

If you want to modify x in place, you don't need to use enumerate, just loop over the zip and append the y-elements to the x-elements:

>>> for a, b in zip(x, y):
...     a.append(b)
...
>>> x
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]
Samwise
  • 68,105
  • 3
  • 30
  • 44
1

You can unpack the first list when constructing the sublists using zip():

[[*item1, item2] for item1, item2 in zip(x, y)]

For example:

x = [['a','b','c'],['d','e','f'],['g','h','i']]
y = [1,2,3]

print([[*item1, item2] for item1, item2 in zip(x, y)])

outputs:

[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
1

Potentially more efficient solution using collections.deque and map to quickly run append over the list+value pairs:

deque(map(list.append, x, y), 0)

Benchmark (using 1000 times longer outer lists):

189 us  191 us  192 us  with_loop
 77 us   77 us   77 us  with_deque

The 0 btw tells deque to just consume, not store anything, so it has very little constant memory overhead. And it's very fast. That's why it's used in itertools' consume recipe and in more-itertools' consume function.

Benchmark code (Try it online!):

def with_loop(x, y):
    for a, b in zip(x, y):
        a.append(b)

def with_deque(x, y):
    deque(map(list.append, x, y), 0)

from timeit import repeat
from collections import deque

funcs = with_loop, with_deque
tss = [[] for _ in funcs]
for _ in range(20):
    for func, ts in zip(funcs, tss):
        x = [['a','b','c'],['d','e','f'],['g','h','i']]
        y = [1,2,3]
        scale = 1000
        x = [a[:] for _ in range(scale) for a in x]
        y *= scale
        t = min(repeat(lambda: func(x, y), number=1))
        ts.append(t)
for func, ts in zip(funcs, tss):
    print(*('%3d us ' % (t * 1e6) for t in sorted(ts)[:3]), func.__name__)
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
  • Note this use of `deque` also appears in the `consume` recipe in the `itertools` documentation. – chepner Mar 31 '22 at 00:17
  • @chepner Yes, mentioned that and other things now. Normally I don't, as normally it's just part of my benchmarking code. But since I'm using it in a solution here, it makes sense to talk about it. – Kelly Bundy Mar 31 '22 at 00:38
1

The solutions provided by most here are all very similar in performance, and similar to your own.

The solution by @kellybundy is the only one that stands out and I doubt you'll find a faster one, given how minimal it is and the fact that it already relies on Python's fast internals. (please accept their answer, not this one, if you agree)

Consider:

from copy import deepcopy
from timeit import timeit
from random import choice
from collections import deque

chars = 'abcdefghijkl'

texts = [[choice(chars) for _ in range(3)] for _ in range(1000)]
nums = [n for n in range(1000)]


def combine0(xss, ys):
    return xss  # only here to show the cost of any overhead


def combine1(xss, ys):
    result = []
    for i, xs in enumerate(xss):
        xs.append(ys[i])
        result.append(xs)
    return result


def combine2(xss, ys):
    return [xs + [y] for xs, y in zip(xss, ys)]


def combine3(xss, ys):
    return [[*xs, y] for xs, y in zip(xss, ys)]


def combine4(xss, ys):
    result = []
    for xs, y in zip(xss, ys):
        xs.append(y)
        result.append(xs)
    return result


def combine5(xss, ys):
    deque(map(list.append, xss, ys), 0)
    return xss


assert combine1(deepcopy(texts), nums) == combine2(deepcopy(texts), nums) == combine3(deepcopy(texts), nums) == combine4(deepcopy(texts), nums) == combine5(deepcopy(texts), nums)

for _ in range(10):
    for n, f in enumerate((combine0, combine1, combine2, combine3, combine4, combine5)):
        copies = iter([deepcopy(texts) for _ in range(1000)])
        time = timeit(lambda: f(next(copies), nums), number=1000) / 1000
        print(f.__name__, f'{time * 1e6 :6.2f} µs')
    print()

Result:

combine0   0.20 µs
combine1  82.28 µs
combine2  93.37 µs
combine3  73.44 µs
combine4  65.77 µs
combine5  16.27 µs

combine0   0.24 µs
combine1  75.62 µs
combine2  92.81 µs
combine3  91.56 µs
combine4  66.39 µs
combine5  17.73 µs

combine0   0.22 µs
combine1  84.68 µs
combine2  96.62 µs
combine3  87.32 µs
combine4  73.86 µs
combine5  15.44 µs

etc.

This shows that there's quite a bit of variation in runtime dependent on all sort of other factors, but there's a clear advantage for combine5, which uses @kellybundy's solution.

The lines with 0 show the performance of the function that does nothing, to show that we're actually measuring the performance of the functions and not just the overhead of the calls etc.

Note: the deepcopys are there to avoid modifying the same list repeatedly and they are created before the test to avoid the creation of copies affecting the measurement.

Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
Grismar
  • 27,561
  • 4
  • 31
  • 54
  • `deepcopy(texts)` is really slow, the whole benchmark runs 4x faster for me if I use `[text[:] for text in texts]` instead. – Kelly Bundy Mar 30 '22 at 22:07
  • That would get you a list with the original lists, which are being modified by some of the solutions, so each iteration would add to a growing list? `deepcopy` ensures the inner lists are copied as well? – Grismar Mar 30 '22 at 22:08
  • Since the test runs a 1000 iterations, you're already getting millisecond values printed, unless I'm making a logic error here I'm missing? I did update the call to `deque` and the print statement, agree there. – Grismar Mar 30 '22 at 22:09
  • The `[:]` makes copies. – Kelly Bundy Mar 30 '22 at 22:10
  • Ah sorry I see, yes I agree - however, I'm going to leave it, as that's just optimising the performance of the test, not affecting the functions themselves. – Grismar Mar 30 '22 at 22:12
  • I meant milliseconds for the timeit result (so yes, microseconds for each iteration). Though the unit wasn't really the point, I just find it easier to compare the times if I don't have to ignore zeros and search for the first non-zero digit. – Kelly Bundy Mar 30 '22 at 22:13
  • Well... for me, with `deepcopy`, it literally takes too long to work. Because I'm running other people's code on tio.run (so I feel safe without having to check for malicious stuff). And it has a time limit of 60 seconds, which this exceeds. I can reduce the repetitions/lengths, but that's work and the value suffers. And in general it's just annoying having to wait :-) – Kelly Bundy Mar 30 '22 at 22:17