13

I want to merge two lists in Python, with the lists being of different lengths, so that the elements of the shorter list are as equally spaced within the final list as possible.

i.e. I want:

l1 = [1, 2, 3, 4]
l2 = ['a', 'b']

output = [1, 'a', 2, 3, 'b', 4]

It needs to be able to function with lists that aren't exact multiples too, so it could take:

l1 = [1, 2, 3, 4, 5]
l2 = ['a', 'b', 'c']

and produce [1, 'a', 2, 'b', 3, 'c', 4, 5] or similar.

It needs to preserve the ordering of both lists.

I can see how to do this by a long-winded brute force method but since Python seems to have a vast array of excellent tools to do all sorts of clever things which I don't know about (yet), I wondered whether there's anything more elegant I can use?


If you want regular interleaving (equal-spaced), see How to interleave two lists of different length?.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Jack Aidley
  • 19,439
  • 7
  • 43
  • 70

9 Answers9

12

Borrowing heavily from Jon Clements' solution, you could write a function that takes an arbitrary number of sequences and returns merged sequence of evenly-spaced items:

import itertools as IT

def evenly_spaced(*iterables):
    """
    >>> evenly_spaced(range(10), list('abc'))
    [0, 1, 'a', 2, 3, 4, 'b', 5, 6, 7, 'c', 8, 9]
    """
    return [item[1] for item in
            sorted(IT.chain.from_iterable(
            zip(IT.count(start=1.0 / (len(seq) + 1), 
                         step=1.0 / (len(seq) + 1)), seq)
            for seq in iterables))]

iterables = [
    ['X']*2,
    range(1, 11),
    ['a']*3
    ]

print(evenly_spaced(*iterables))

yields

[1, 2, 'a', 3, 'X', 4, 5, 'a', 6, 7, 'X', 8, 'a', 9, 10]
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Sorting isn't stable. You can fix it by putting `operator.itemgetter(0)` as the sort key. BTW I posted [my own answer](https://stackoverflow.com/a/59594546/4518341) that includes the fix. – wjandrea Jan 04 '20 at 20:26
10

This is basically the same as Bresenham's line algorithm. You can calculate "pixel" positions and use them as the indices into the lists.

Where your task differs is that you only want each element to show up once. You'd need to either modify the algorithm or post-process the indices, appending the elements from the lists only the first time they appear. There is a slight ambiguity, though: when both pixel/list indices change at the same time, you'll need to pick which one to include first. This corresponds to the two different options for interleaving the lists that are mentioned in the question and a comment.

Michael J. Barber
  • 24,518
  • 9
  • 68
  • 88
7

With the assumption that a is the sequence to be inserted into:

from itertools import izip, count
from operator import itemgetter
import heapq

a = [1, 2, 3, 4]
b = ['a', 'b']

fst = enumerate(a)
snd = izip(count(0, len(a) // len(b)), b)
print map(itemgetter(1), heapq.merge(fst, snd))
# [1, 'a', 2, 3, 'b', 4]
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • @JackAidley ugh - of course in Python 3.x you will - one sec ;) – Jon Clements Oct 10 '13 at 12:13
  • clever. For python3 it's probably neater to use a list comprehension than `list(map(...))` – John La Rooy Oct 10 '13 at 12:49
  • Update Python 3.5+, the latter will not work. `heapq.merge()` accepts a [`key` function](https://docs.python.org/3.8/library/heapq.html#heapq.merge), e.g. `[x for i, x in heapq.merge(fst, snd, key=lambda x: x[0])]` – pylang Oct 11 '18 at 02:10
6

if a is the longer list and b is the shorter

from itertools import groupby

len_ab = len(a) + len(b)
groups = groupby(((a[len(a)*i//len_ab], b[len(b)*i//len_ab]) for i in range(len_ab)),
                 key=lambda x:x[0])
[j[i] for k,g in groups for i,j in enumerate(g)]

eg

>>> a = range(8)
>>> b = list("abc")
>>> len_ab = len(a) + len(b)
>>> groups = groupby(((a[len(a)*i//len_ab], b[len(b)*i//len_ab]) for i in range(len_ab)), key=lambda x:x[0])
>>> [j[i] for k,g in groups for i,j in enumerate(g)]
[0, 'a', 1, 2, 'b', 3, 4, 5, 'c', 6, 7]

You can use this trick to make sure a is longer than b

b, a = sorted((a, b), key=len)
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • This was the solution I went with in the end although I amended the final line to `[j[i] for _,g in groups for i,j in enumerate(g)]` so as to suppress a warning about an unused variable. – Jack Aidley Oct 14 '13 at 09:50
4

If we modify @Jon's answer like this

from itertools import count
import heapq

[x[1] for x in heapq.merge(izip(count(0, len(b)), a), izip(count(0, len(a)), b))]

It doesn't matter which of a/b is longest

John La Rooy
  • 295,403
  • 53
  • 369
  • 502
2

I like unutbu's answer but not the nested style, so I rewrote it. While I was there I noticed the sorting wasn't stable, so I fixed it using operator.itemgetter.

I also replaced itertools.count with enumerate because it's more intuitive. As a bonus it should also be more accurate for large inputs, though I haven't tested it.

import itertools
import operator

def distribute(sequence):
    """
    Enumerate the sequence evenly over the interval (0, 1).

    >>> list(distribute('abc'))
    [(0.25, 'a'), (0.5, 'b'), (0.75, 'c')]
    """
    m = len(sequence) + 1
    for i, x in enumerate(sequence, 1):
        yield i/m, x

def intersperse(*sequences):
    """
    Evenly intersperse the sequences.

    Based on https://stackoverflow.com/a/19293603/4518341

    >>> list(intersperse(range(10), 'abc'))
    [0, 1, 'a', 2, 3, 4, 'b', 5, 6, 7, 'c', 8, 9]
    >>> list(intersperse('XY', range(10), 'abc'))
    [0, 1, 'a', 2, 'X', 3, 4, 'b', 5, 6, 'Y', 7, 'c', 8, 9]
    >>> ''.join(intersperse('hlwl', 'eood', 'l r!'))
    'hello world!'
    """
    distributions = map(distribute, sequences)
    get0 = operator.itemgetter(0)
    for _, x in sorted(itertools.chain(*distributions), key=get0):
        yield x

Note that there's one difference from your second example, where 'b' and 'c' get moved down:

>>> list(intersperse(range(1, 6), 'abc'))
[1, 'a', 2, 3, 'b', 4, 'c', 5]
wjandrea
  • 28,235
  • 9
  • 60
  • 81
1

If we want to do this without itertools:

def interleave(l1, l2, default=None):  
    max_l = max(len(l1), len(l2))
    data  = map(lambda x: x + [default] * (max_l - len(x)), [l1,l2])
    return [data[i%2][i/2] for i in xrange(2*max_l)]

Ahh, missed the equally spaced part. This was for some reason marked as duplicate with a question that didn't require equally spaced in the presence of differing list lengths.

Patrick
  • 127
  • 1
  • 3
  • Thanks for answering a question and welcome to SO. For your future reference, your edited comment probably isn't necessary (you could have posted it when you made your edit there) or you could indicated that you edited post, e.g., "**Edit** I missed ..." – Richard Erickson Jun 17 '15 at 23:47
1

A variant of @Jon Clements answer using more_itertools.collate with explanation.

Given

import itertools as it

import more_itertools as mit


a, b = range(1, 5), ["a", "b"]

Code

first = enumerate(a)
second = zip(it.count(0, len(a) // len(b)), b)
[x for i, x in mit.collate(first, second, key=lambda x: x[0])]
# [1, 'a', 2, 3, 'b', 4] 

Details

This answer is updated to use with Python 3.

first and second are iterables of tuples, each tuple comprising a position-element pair.

list(first)
# [(0, 1), (1, 2), (2, 3), (3, 4)]

list(second)
# [(0, 'a'), (2, 'b')]

more_itertools.collate() wraps heapq.merge(), which merges the pre-sorted first and second iterables in order. In the final list comprehension, the key is the sorting function while the last element in each tuple in returned.

See Also

Install this third-party package via > pip install more_itertools

pylang
  • 40,867
  • 14
  • 129
  • 121
0

This function will generate an even mixture of any number of lists, without relying on the expensive sort

from typing import Any, Iterable, List

def evenly_mix(*lists: List[Any]) -> Iterable[Any]:
    m = max(len(b) for b in lists)
    deltas = [len(b) / m for b in lists]
    cnts, psums = [[0 for _ in lists] for _ in [0, 0]]
    for i in range(m):
        for j, b in enumerate(lists):
            psums[j] += deltas[j]
            if psums[j] >= cnts[j] + 1:
                yield b[cnts[j]]
                cnts[j] += 1
John Jiang
  • 827
  • 1
  • 9
  • 19