10

The question is to print all possible interleavings of two given strings. So I wrote a working code in Python which runs like this:

def inter(arr1,arr2,p1,p2,arr):
    thisarr = copy(arr)
    if p1 == len(arr1) and p2 == len(arr2):
        printarr(thisarr)
    elif p1 == len(arr1):
        thisarr.extend(arr2[p2:])
        printarr(thisarr)
    elif p2 == len(arr2):
        thisarr.extend(arr1[p1:])
        printarr(thisarr)
    else:
        thisarr.append(arr1[p1])
        inter(arr1,arr2,p1+1,p2,thisarr)
        del thisarr[-1]
        thisarr.append(arr2[p2])
        inter(arr1,arr2,p1,p2+1,thisarr)
    return

It comes at each point in a string, then for one recursive call, it considers the current element as belonging to the first array, and in the next call, belonging to the other array. So if the input strings are ab and cd, it prints out abcd, acbd, cdab, cabd, etc. p1 and p2 are pointers to the arrays (Because Python strings are immutable, I am using arrays!). Can anybody tell me, what is this code's complexity, and if it can be improved or not? I have written a similar code to print all combinations of length k from a given array:

def kcomb(arr,i,thisarr,k):
     thisarr = copy(thisarr)
     j,n = len(thisarr),len(arr)
     if n-i<k-j or j >k:
        return
     if j==k:
        printarr(thisarr)
        return
     if i == n:
         return
     thisarr.append(arr[i])
     kcomb(arr,i+1,thisarr,k)
     del thisarr[-1]
     kcomb(arr,i+1,thisarr,k)
     return

This too, works on the same principle. So in general, how to find the complexity of such functions, and how do I optimize them? Is it possible to do these by DP? Sample input-output for the first problem:

>>> arr1 = ['a','b','c']
>>> arr2 = ['d','e','f']
>>> inter(arr1,arr2,0,0,[])
abcdef
abdcef
abdecf
abdefc
adbcef
adbecf
adbefc
adebcf
adebfc
adefbc
dabcef
dabecf
dabefc
daebcf
daebfc
daefbc
deabcf
deabfc
deafbc
defabc
Cœur
  • 37,241
  • 25
  • 195
  • 267
SexyBeast
  • 7,913
  • 28
  • 108
  • 196
  • can you please post the code for "printarr" – Inbar Rose Oct 11 '12 at 09:30
  • Yeah I can, but that's trivial, it just takes the array as input, concatenates all the elements into a string and prints it out! – SexyBeast Oct 11 '12 at 09:31
  • 1
    It's hard to grasp what you're doing. Can you post the input and expected output. – monkut Oct 11 '12 at 09:37
  • if i understand correctly - you are have two strings, and you want to see all the combinations using all the characters in each string that you can make? ie. you have two character arrays, and you want to combine them in every way possible. ie, you have a list of items and you want to find each way to combine them? – Inbar Rose Oct 11 '12 at 09:46
  • 2
    Almost, but preserving the order of individual strings. That is, if string1 is `abcd`, and string2 is `efgh`, then in the interleaved string, `a' should come before `b`, which should come before `c`, which should come before `d`. In between these, characters from string2 should fit in, but on the same condition, preserving order of individual letters. – SexyBeast Oct 11 '12 at 09:49
  • so basically you want `aebfcgdh` or `abefcgdh` or `abcefghd` etc..? – Inbar Rose Oct 11 '12 at 09:52
  • it would really help if you could give us some `input` and `output` of your current working code. – Inbar Rose Oct 11 '12 at 09:57
  • Please see the comments below. – SexyBeast Oct 11 '12 at 10:06

4 Answers4

27

Your problem can be reduced to that of creating all unique permutations of a particular list. Say A and B are the lengths of the strings arr1 and arr2, respectively. Then construct a list like this:

[0] * A + [1] * B

There exists a one-to-one correspondence (a bijection) from the unique permutations of this list to all the possible interleavings of the two strings arr1 and arr2. The idea is to let each value of the permutation specify which string to take the next character from. Here is an example implementation showing how to construct an interleaving from a permutation:

>>> def make_interleave(arr1, arr2, permutation):
...     iters = [iter(arr1), iter(arr2)]
...     return "".join(iters[i].next() for i in permutation)
... 
>>> make_interleave("ab", "cde", [1, 0, 0, 1, 1])
'cabde'

I found this question in the python mailing list which asks how to solve this problem in an efficient manner. The answers suggest using an algorithm which is described in Knuth's The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Permutations. I found an online pdf of the draft here. The algorithm is also described in this wikipedia article.

Here's my own annotated implementation of the next_permutation algorithm, as a python generator function.

def unique_permutations(seq):
    """
    Yield only unique permutations of seq in an efficient way.

    A python implementation of Knuth's "Algorithm L", also known from the 
    std::next_permutation function of C++, and as the permutation algorithm 
    of Narayana Pandita.
    """

    # Precalculate the indices we'll be iterating over for speed
    i_indices = list(range(len(seq) - 1, -1, -1))
    k_indices = i_indices[1:]

    # The algorithm specifies to start with a sorted version
    seq = sorted(seq)

    while True:
        yield seq

        # Working backwards from the last-but-one index,           k
        # we find the index of the first decrease in value.  0 0 1 0 1 1 1 0
        for k in k_indices:
            if seq[k] < seq[k + 1]:
                break
        else:
            # Introducing the slightly unknown python for-else syntax:
            # else is executed only if the break statement was never reached.
            # If this is the case, seq is weakly decreasing, and we're done.
            return

        # Get item from sequence only once, for speed
        k_val = seq[k]

        # Working backwards starting with the last item,           k     i
        # find the first one greater than the one at k       0 0 1 0 1 1 1 0
        for i in i_indices:
            if k_val < seq[i]:
                break

        # Swap them in the most efficient way
        (seq[k], seq[i]) = (seq[i], seq[k])                #       k     i
                                                           # 0 0 1 1 1 1 0 0

        # Reverse the part after but not                           k
        # including k, also efficiently.                     0 0 1 1 0 0 1 1
        seq[k + 1:] = seq[-1:k:-1]

Each yield of the algorithm has an amortized complexity of O(1), according to this question, but according to rici who commented below, this is only the case if all numbers are unique, which they are definately not in this case.

In any case, the number of yields provides a lower bound for the time complexity, and it is given by

(A + B)! / (A! * B!)

Then to find the real time complexity we need to sum the average complexity of each yield with the complexity of constructing the resulting string based on the permutation. If we multiply this sum with the above formula we get the total time complexity.

CrazyChucky
  • 3,263
  • 4
  • 11
  • 25
Lauritz V. Thaulow
  • 49,139
  • 12
  • 73
  • 92
  • @cupidvogel As I said, the algorithm is explained in the same blog where the code is. – Lauritz V. Thaulow Oct 11 '12 at 11:14
  • @lazyr: that code is only O(1) amortized for permutations of sequences of unique elements. If the sequence has a lot of repetitions of one value, it becomes (I think) O(n) (for each permutation). (It's exercise 6 in the Knuth fascicle.) – rici Oct 11 '12 at 17:37
  • @cupidvogel I've replaced the code with my own annotated reimplementation. Hopefully it will aid you in understanding the algorithm and the code. – Lauritz V. Thaulow Oct 15 '12 at 20:06
  • Nice algorithm. Just to be pedantic, the "most efficient" way of swapping variables (in python) is to not use the `tmp` variable at all. You can swap directly, via: `seq[k], seq[i] = seq[i], seq[k]` – Gerrat Aug 20 '14 at 13:01
  • @Gerrat I mistakenly assumed using the tuple swapping method would incur a performance penalty. I see I was wrong. I've updated the code, thanks for the heads up! – Lauritz V. Thaulow Oct 10 '14 at 19:29
  • 6
    I ran into an issue with this algorithm: if I test it with `orig = [1,1,1,2,2,3]; list(unique_permutations(orig))`, it always accesses the original object and thus the result in the end is not correct. Maybe add to the post that you can call `import copy; [copy.copy(x) for x in unique_permutations(orig)]` – Roelant Oct 17 '18 at 12:20
  • @Roelant I don't observe the issue you describe with `orig = ['0','0','1','1','1','1','2','2','2','2','3','4']` with Python 3.11.4 on Windows 10 (64bit). Can you please explain/re-evaluate? I think the [edit to revision 11](https://stackoverflow.com/revisions/12837695/11) by @CrazyChucky fixed it. – Wolf Aug 26 '23 at 11:14
1

okay, after some work, and usign advice from other answers. mainly lazyr. (and have now converted it into a class) the __all_perms is from: https://stackoverflow.com/a/104436/1561176

class Interleave():

    def __init__(self, A, B):
        self.A = A
        self.B = B
        self.results = list(self.__interleave())

    # from https://stackoverflow.com/a/104436/1561176
    def __all_perms(self, elements):
        if len(elements) <=1:
            yield elements
        else:
            for perm in self.__all_perms(elements[1:]):
                for i in range(len(elements)):
                    #nb elements[0:1] works in both string and list contexts
                    yield perm[:i] + elements[0:1] + perm[i:]

    def __sequences(self):
        return list( sorted( set(
            ["".join(x) for x in self.__all_perms(['a'] * len(self.A) + ['b'] * len(self.B))] ) ) )

    def __interleave(self):
        for sequence in self.__sequences():
            result = ""
            a = 0
            b = 0
            for item in sequence:
                if item == 'a':
                    result+=self.A[a]
                    a+=1
                else:
                    result+=self.B[b]
                    b+=1
            yield result

    def __str__(self):
        return str(self.results)

    def __repr__(self):
        return repr(self.results)

and here is the usage:

>>> A = ['a', 'b', 'c']
>>> B = ['d', 'e', 'f']
>>> Interleave(A, B)
['abcdef', 'abdcef', 'abdecf', 'abdefc', 'adbcef', 'adbecf', 'adbefc', 'adebcf', 'adebfc', 'adefbc', 'dabcef', 'dabecf', 'dabefc', 'daebcf', 'daebfc', 'daefbc', 'deabcf', 'deabfc', 'deafbc', 'defabc']

also, you can access the Class members like:

>>> inter = Interleave(A, B)
>>> inter.results
['abcdef', 'abdcef', 'abdecf', 'abdefc', 'adbcef', 'adbecf', 'adbefc', 'adebcf', 'adebfc', 'adefbc', 'dabcef', 'dabecf', 'dabefc', 'daebcf', 'daebfc', 'daefbc', 'deabcf', 'deabfc', 'deafbc', 'defabc']
>>> inter.A
['a', 'b', 'c']
>>> inter.B
['d', 'e', 'f']
Community
  • 1
  • 1
Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
  • @Cupidvogel well, the `all_perms()` is from an answer: http://stackoverflow.com/a/104436/1561176 and so i wont start discussing it here as it is discussed there. `_sequences()` simply calls the `all_perms()` and then converts the resulting lists to strings, makes them uniques, and sorts them, before converting them to a list and sending them to the `interleave()` which reads the lists and simply adds values from `A` or `B` depending on the order it is given from the sequence. – Inbar Rose Oct 11 '12 at 11:16
  • 2
    Note that this algorithm generates *all* permutations and then filters out the duplicates, unlike the algorithm in my answer. For two strings of length 10 for example, `__all_perms` yields 2432902008176640000 times, while `next_permutation` only yields the necessary 184756 times. – Lauritz V. Thaulow Oct 11 '12 at 12:00
  • @lazyr yes, you are right. best to use a combination of both answers in this case. when i wrote this answer i didnt see the code you wrote, only the link you provided where someone had a similar problem and your idea of making unique combinations. i suppose i hadn't thought about the scaling. – Inbar Rose Oct 11 '12 at 12:42
0

Does permutations work for you? Or, is this coding practice?

>>> from itertools import permutations
>>> s1 = "ab"
>>> s2 = "cd"
>>> all_values = [c for c in s1 + s2]
>>> ["".join(r) for r in permutations(all_values)]
['abcd', 'abdc', 'acbd', 'acdb', 'adbc', 'adcb', 'bacd', 'badc', 'bcad', 'bcda', 'bdac', 'bdca', 'cabd', 'cadb', 'cbad', 'cbda', 'cdab', 'cdba', 'dabc', 'dacb', 'dbac', 'dbca', 'dcab', 'dcba']
monkut
  • 42,176
  • 24
  • 124
  • 155
  • No no, I am trying to develop a working code without resorting to these in-built functions. Plus your output is not correct, in many of the output strings, `b` comes before `a`, whereas `a` should always come before `b`. That is interleaving, not permutation! – SexyBeast Oct 11 '12 at 09:50
  • I believe a subset of the permuted outputs is required here -- where the 2 input strings are interleaved in order. So, abdc is not a valid output – Dhara Oct 11 '12 at 09:51
  • @Cupidvogel You're not interested at all in a solution using itertools? – Lauritz V. Thaulow Oct 11 '12 at 09:52
  • 1
    @Cupidvogel What do you have against "resorting" to built-in functions? – Dhara Oct 11 '12 at 09:53
  • I want to do it myself with the basic functionalities. If I resort to `itertools`, the entire work is done by that, no? – SexyBeast Oct 11 '12 at 09:54
  • I am not against them per se. But I want to be able to do it with just sheer basic coding. – SexyBeast Oct 11 '12 at 09:54
  • Ok, I see, I didn't see the ordered part initially. – monkut Oct 11 '12 at 09:57
0

This is what I think you're trying to do:

from itertools import product, chain

def interleave(a, b):
    if len(b) > len(a):
        a, b = b, a

    boolean = (True, False)
    for z in range(len(a) - len(b) + 1):
        pairs = list(zip(a[z:], b))
        for vector in product(*[boolean] * max(map(len, (a, b)))):
            permutation = (pair if i else reversed(pair)
                           for i, pair in zip(vector, pairs))
            yield a[:z] + ''.join(chain.from_iterable(permutation))
Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
  • Can you have a look at my solution and tell me the complexity. I a looking for a non-recursive solution without using `itertools`. – SexyBeast Oct 11 '12 at 09:57
  • Well, you could easily modify this to not use itertools. `product` in this case just iterates over all the values between `0` and `2^len(a)` in binary, you could probably just do `range(2^len(a)):` and convert to binary. I just realized though, that this code leaves out certain interleaves. It can be modified to include them easily however. I may be able to revisit this in a few hours. – Joel Cornett Oct 11 '12 at 10:04