Combine lists having a specific merge order in a pythonic way?

Question

I would like to construct list x from two lists y and z. I want all elements from y be placed where ypos elements point. For example:

y = [11, 13, 15]
z = [12, 14]
ypos = [1, 3, 5]

So, x must be [11, 12, 13, 14, 15]

Another example:

y = [77]
z = [35, 58, 74]
ypos = [3]

So, x must be [35, 58, 77, 74]

I've written function that does what I want but it looks ugly:

def func(y, z, ypos):
    x = [0] * (len(y) + len(z))
    zpos = list(range(len(y) + len(z)))
    for i, j in zip(y, ypos):
        x[j-1] = i
        zpos.remove(j-1)
    for i, j in zip(z, zpos):
        x[j] = i
    return x

How to write it in pythonic way?

I guess you need to post this question to https://codereview.stackexchange.com/. — Tigran Saluev, Nov 22 '17 at 20:25
@TigranSaluev Changing the name would make it ok, so not really I think. — scharette, Nov 22 '17 at 20:35
@scharette He has working code and is looking to improve it's style to closer fit a language. That isn't within the scope of SO. — mascoj, Nov 22 '17 at 21:32

tobias_k · Accepted Answer · 2017-11-23T13:26:36.870

35

If the lists are very long, repeatedly calling insert might not be very efficient. Alternatively, you could create two iterators from the lists and construct a list by getting the next element from either of the iterators depending on whether the current index is in ypos (or a set thereof):

>>> ity = iter(y)
>>> itz = iter(z)
>>> syp = set(ypos)
>>> [next(ity if i+1 in syp else itz) for i in range(len(y)+len(z))]
[11, 12, 13, 14, 15]

Note: this will insert the elements from y in the order they appear in y itself, i.e. the first element of y is inserted at the lowest index in ypos, not necessarily at the first index in ypos. If the elements of y should be inserted at the index of the corresponding element of ypos, then either ypos has to be in ascending order (i.e. the first index of ypos is also the lowest), or the iterator of y has to be sorted by the same order as the indices in ypos (afterwards, ypos itself does not have to be sorted, as we are turning it into a set anyway).

>>> ypos = [5,3,1]   # y and z being same as above
>>> ity = iter(e for i, e in sorted(zip(ypos, y)))
>>> [next(ity if i+1 in syp else itz) for i in range(len(y)+len(z))]
[15, 12, 13, 14, 11]

edited Nov 23 '17 at 13:26

answered Nov 22 '17 at 20:38

tobias_k

81,265
12
120
179

2

Excellent approach. It mimics the way you'd distribute cards from two decks. – Eric Duminil Nov 22 '17 at 22:09
Can equivalently make `i+1` just `i` and change range to `range(1, len(y)+len(z)+1)` as it would remove `n - 1` `i+1` operations, but that's really just a nitpick micro optimization. – John B Nov 23 '17 at 02:06
It may be obvious, but one should ensure that `len(y) == len(ypos)` before running the list comprehension. – SethMMorton Nov 23 '17 at 07:02
2

It's not clear at first, but you basically sort `ypos` again since you iterate with an increasing `i`. You can try with `f([15, 13, 11], [12, 14], [5, 3, 1])`. It returns `[15, 12, 13, 14, 11]`, as if `ypos` was `[1, 3, 5]`. – Eric Duminil Nov 23 '17 at 08:38
1

@EricDuminil Ah, now I get what you mean. But `ypos` does not have to be sorted, instead my approach ignores any order of `ypos` and just adds the elements from `y` in the order they have, not at the "corresponding" position in `ypos`. Interesting point. – tobias_k Nov 23 '17 at 08:59
1

@tobias_k: Indeed. `ypos` can be written in any order you want but `y` should be written in the same order as a sorted `ypos`. :) – Eric Duminil Nov 23 '17 at 09:16
@EricDuminil Interesting discussion. It inspired me to add an answer that uses an alternative approach to preserving the corresponding positions. – John B Mar 03 '18 at 23:21

Joe Iddon · Answer 2 · 2017-11-22T20:29:36.270

12

You should use list.insert, this is what it was made for!

def func(y, z, ypos):
    x = z[:]
    for pos, val in zip(ypos, y):
        x.insert(pos-1, val)
    return x

and a test:

>>> func([11, 13, 15], [12, 14], [1,3,5])
[11, 12, 13, 14, 15]

edited Nov 22 '17 at 20:29

answered Nov 22 '17 at 20:27

Joe Iddon

20,101
7
33
54

1

since the question was about constructing a new list, you should copy `z` so it is not modifying the original list. – James Nov 22 '17 at 20:28
This assumes the indexes will be in order – C. Feenstra Nov 22 '17 at 20:30
@C.Feenstra The question assumes they are. It is impossible to know how they should be ordered (i.e. do we order just the `indexes`, or do we `zip` them, then order) so I decided this wasn't necessary. However if the OP explicitly states their position on this aspect then by all means, I will update the answer **:)** – Joe Iddon Nov 22 '17 at 20:34
Yes, the order in `x` must hold like in your answer – danielleontiev Nov 22 '17 at 20:40
If `ypos` might not be ordered, you could just use `sorted(ypos)` i.e. `for pos, val in zip(sorted(ypos), y):` – Arthur Tacca Nov 23 '17 at 07:54
@ArthurTacca: You cannot sort `ypos` without also reorganizing `y`. – Eric Duminil Nov 23 '17 at 08:39
BTW, this approach is fine with list of hundreds of elements. With larger lists, it becomes really slow. – Eric Duminil Nov 23 '17 at 10:35
@EricDuminil My mistake, it should be `sorted(zip(ypos, y))`. And your second comment is sort of my point; for a few hundred elements it's fine; often with this sort of question, that's the type of number involved. – Arthur Tacca Nov 28 '17 at 09:23

Eric Duminil · Answer 3 · 2017-11-23T14:02:23.910

With large lists, it might be a good idea to work with numpy.

Algorithm

create a new array as large as y + z
calculate coordinates for z values
assign y values to x at ypos
assign z values to x at zpos

The complexity should be O(n), with n being the total number of values.

import numpy as np

def distribute_values(y_list, z_list, y_pos):
    y = np.array(y_list)
    z = np.array(z_list)
    n = y.size + z.size
    x = np.empty(n, np.int)
    y_indices = np.array(y_pos) - 1
    z_indices = np.setdiff1d(np.arange(n), y_indices, assume_unique=True)
    x[y_indices] = y
    x[z_indices] = z
    return x

print(distribute_values([11, 13, 15], [12, 14], [1, 3, 5]))
# [11 12 13 14 15]
print(distribute_values([77], [35, 58, 74], [3]))
# [35 58 77 74]

As a bonus, it also works fine when ypos isn't sorted:

print(distribute_values([15, 13, 11], [12, 14], [5, 3, 1]))
# [11 12 13 14 15]
print(distribute_values([15, 11, 13], [12, 14], [5, 1, 3]))
# [11 12 13 14 15]

Performance

With n set to 1 million, this approach is a bit faster than @tobias_k's answer and 500 times faster than @Joe_Iddon's answer.

The lists were created this way:

from random import random, randint
N = 1000000
ypos = [i+1 for i in range(N) if random()<0.4]
y = [randint(0, 10000) for _ in ypos]
z = [randint(0, 1000) for _ in range(N - len(y))

Here are the results with %timeit and IPython:

%timeit eric(y, z, ypos)
131 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit tobias(y, z, ypos)
224 ms ± 977 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit joe(y,z, ypos)
54 s ± 1.48 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Nice timing analysis, but I'd say mine only takes 1.5 times longer (in fact, I got 135 and 179 ms respectively). :-P Surprised Joe's isn't slower, though, I'd expect it to be quadratic. — tobias_k, Nov 23 '17 at 13:38
@tobias_k: Yes, it depends on the system and underlying libraries. Factor 1.5 or 2 isn't much with such a large list. I was surprised by Joe 's answer with smaller lists. With `n=100`, it was actually a bit faster than yours for example. — Eric Duminil, Nov 23 '17 at 14:03

score 2 · Answer 4 · answered Nov 22 '17 at 20:48

Assuming that the ypos indices are sorted, here is another solution using iterators, though this one also supports ypos of unknown or infinite length:

import itertools

def func(y, ypos, z):
    y = iter(y)
    ypos = iter(ypos)
    z = iter(z)
    next_ypos = next(ypos, -1)
    for i in itertools.count(start=1):
        if i == next_ypos:
            yield next(y)
            next_ypos = next(ypos, -1)
        else:
            yield next(z)

John B · Answer 5 · 2018-03-03T23:24:04.443

If you want the elements in ypos to be placed at the x index where each element's index in ypos should correspond with the same y index's element:

Initialize x to the required size using all null values.
Iterate through the zipped y and ypos elements to fill in each corresponding y element into x.
Iterate through x and replace each remaining null value with z values where each replacement will choose from z in increasing order.

y = [11, 13, 15]
z = [12, 14]
ypos = [1, 5, 3]

x = [None] * (len(y) + len(z))
for x_ypos, y_elem in zip(ypos, y):
    x[x_ypos - 1] = y_elem

z_iter = iter(z)
x = [next(z_iter) if i is None else i for i in x]
# x -> [11, 12, 15, 14, 13]

PythonProgrammi · Answer 6 · 2017-12-29T06:27:10.320

Pythonic way

y = [11, 13, 15]
z = [12, 14]
ypos = [1, 3, 5]

x = z[:]

for c, n in enumerate(ypos):
    x.insert(n - 1, y[c])

print(x)

output

[11, 12, 13, 14, 15]

In a function

def func(y, ypos, z):
    x = z[:]
    for c,n in enumerate(ypos):
        x.insert(n-1,y[c])
    return x

print(func([11,13,15],[1,2,3],[12,14]))

outoput