5

Situation

To begin with, you have an array/list A, then you want to transform it to an expected array/list B given. The only actions that you can apply on array are InsertAt and DeleteAt where they are able to insert and delete an element at certain index from list.

note: Array B is always sorted while Array A may not be.

For instance, you have an array A of [1, 4, 3, 6, 7]

and you want it to become [2, 3, 4, 5, 6, 6, 7, 8]

One way of doing that is let A undergo following actions:

    deleteAt(0); // which will delete element 1, arrayA now [4, 3, 6, 7]
    deleteAt(0); // delete element 4 which now at index 0
                 // array A now [3, 6, 7]
    insertAt(0, 2); // Insert value to at index 0 of array A
                    // array A now [2, 3, 6, 7]
    insertAt(2, 4); // array now [2, 3, 4, 6, 7]
    insertAt(3, 5); // array Now [2, 3, 4, 5, 6, 7]
    insertAt(5, 6); // array Now [2, 3, 4, 5, 6, 6, 7]
    insertAt(7, 8); // array Now [2, 3, 4, 5, 6, 6, 7, 8]

On the above example, 7 operations were done on array A to transform it to array we wanted.

Hence, how do we find the what are the steps to transform A to B, as well as the minimum steps? Thanks!

btw, solution of deleting all element at A then add everything from B to A is only applicable when A & B have nothing in common.

My thoughts

What I have done so far:

  1. Compare the array A and array B, then save delete all the elements in Array A that can't be found in array B.
  2. Find the longest increasing subsequence from the common list of A and B.
  3. delete All elements that are not in Longest increasing sub sequence.
  4. compare what is left with B, then add elements accordingly.

However, I'm struggling from implementing that..

Change log

  1. fixed a typo of missing out element 7, now least operation is 7.
  2. Added MY THOUGHTS section
  3. There was a answer that elaborated on Levenshtein distance (AKA min edit distance), somehow it disappeared.. But I found that really useful after reading git/git levenshtein.c file, it seems to be a faster algorithm then what I already have. However, I'm not sure will that algorithm give me the detailed steps, or it is only capable of giving min num of steps.
billz
  • 61
  • 5
  • Nice problem! Show us what you've tried so far. – SolarBear Dec 16 '16 at 21:32
  • Your problem is similar to computing the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) between the strings. It differs in that Levenshtein distance allows single-character substitutions as operations, too. You could consider studying some of the many implementations of Levenshtein distance calculators for ideas. – John Bollinger Dec 16 '16 at 21:34
  • Only 7 operations are needed. Delete `1` and `3` leaving `4, 6, 7` (2 operations). Then insert `2` and `3` and `5` and `6` and `8` (5 operations) total 7 operations. – Weather Vane Dec 16 '16 at 21:37
  • 1
    @SolarBear ,What I have done so far: **1.** Compare the array A and array B, then save delete all the elements in Array A that can't be found in array B. **2.** Find the longest increasing subsequence from the common list of A and B. **3.** delete All elements that are not in Longest increasing sub sequence. **4**, compare what is left with B, then add elements accordingly. I'm struggling to implement that in code though – billz Dec 16 '16 at 21:55
  • @JohnBollinger Noted, will definitely check that out. Thanks for the info man. – billz Dec 16 '16 at 21:56
  • @WeatherVane Thanks for spotting out! Actually mine should be 7 as well, but I made a typo. leaving `4, 6, 7` will be same as leaving `3, 6, 7` in the list, because they are both longest subsequence in the list. I inserted `7` again at the end.. fixing it, thx. – billz Dec 16 '16 at 21:59
  • Are arrays A and B always sorted as in your example ? – Gribouillis Dec 16 '16 at 22:14
  • @Gribouillis Array B is always sorted, while array A may not be. Array A can be either sorted or non sorted. Thanks for asking, will add it to spec. – billz Dec 16 '16 at 22:21
  • *must* the result be sorted? – wildplasser Dec 20 '16 at 00:36
  • @wildpasser yes, It has to be identical as `arrayB`, which means it has to be sorted, since B is arleady sorted. – billz Dec 20 '16 at 01:02

2 Answers2

2

I have a python program that seems to work, but it is not very short

__version__ = '0.2.0'

class Impossible(RuntimeError): pass

deleteAt = 'deleteAt'
insertAt = 'insertAt' 
incOffset = 'incOffset'

def remove_all(size):
    return [(deleteAt, i, None) for i in range(size-1, -1, -1)]

def index_not(seq, val):
    for i, x in enumerate(seq):
        if x != val:
            return i
    return len(seq)

def cnt_checked(func):
    """Helper function to check some function's contract"""
    from functools import wraps
    @wraps(func)
    def wrapper(src, dst, maxsteps):
        nsteps, steps = func(src, dst, maxsteps)
        if nsteps > maxsteps:
            raise RuntimeError(('cnt_checked() failed', maxsteps, nsteps))
        return nsteps, steps
    return wrapper

@cnt_checked
def strategy_1(src, dst, maxsteps):
    # get dst's first value from src
    val = dst[0]
    try:
        index = src.index(val)
    except ValueError:
        raise Impossible

    # remove all items in src before val's first occurrence
    left_steps = remove_all(index)
    src = src[index:]
    n = min(index_not(src, val), index_not(dst, val))
    score = len(left_steps)
    assert n > 0
    left_steps.append([incOffset, n, None])
    right_steps = [[incOffset, -n, None]]
    nsteps, steps = rec_find_path(src[n:], dst[n:], maxsteps - score)
    return (score + nsteps, (left_steps + steps + right_steps))

@cnt_checked
def strategy_2(src, dst, maxsteps):
    # do not get dst's first value from src
    val = dst[0]
    left_steps = []
    src = list(src)
    for i in range(len(src)-1, -1, -1):
        if src[i] == val:
            left_steps.append((deleteAt, i, None))
            del src[i]
    n = index_not(dst, val)
    right_steps = [(insertAt, 0, val) for i in range(n)]
    dst = dst[n:]
    score = len(left_steps) + len(right_steps)
    nsteps, steps = rec_find_path(src, dst, maxsteps - score)
    return (score + nsteps, (left_steps + steps + right_steps))

@cnt_checked
def rec_find_path(src, dst, maxsteps):

    if maxsteps <= 0:
        if (maxsteps == 0) and (src == dst):
            return (0, [])
        else:
            raise Impossible

    # if destination is empty, clear source
    if not dst:
        if len(src) > maxsteps:
            raise Impossible
        steps = remove_all(len(src))
        return (len(steps), steps)

    found = False
    try:
        nsteps_1, steps_1 = strategy_1(src, dst, maxsteps)
    except Impossible:
        pass
    else:
        found = True
        maxsteps = nsteps_1 - 1
    try:
        nsteps_2, steps_2 = strategy_2(src, dst, maxsteps)
    except Impossible:
        if found:
            return (nsteps_1, steps_1)
        else:
            raise
    else:
        return (nsteps_2, steps_2)

def find_path(A, B):
    assert B == list(sorted(B))
    maxsteps = len(A) + len(B)
    nsteps, steps = rec_find_path(A, B, maxsteps)
    result = []
    offset = 0
    for a, b, c in steps:
        if a == incOffset:
            offset += b
        else:
            result.append((a, b + offset, c))
    return result

def check(start, target, ops):
    """Helper function to check correctness of solution"""
    L = list(start)
    for a, b, c in ops:
        print(L)
        if a == insertAt:
            L.insert(b, c)
        elif a == deleteAt:
            del L[b]
        else:
            raise RuntimeError(('Unexpected op:', a))
    print(L)
    if L != target:
        raise RuntimeError(('Result check failed, expected', target, 'got:', L))

start = [1, 4, 3, 6, 7]
target = [2, 3, 4, 5, 6, 6, 7, 8]

ops = find_path(start, target)
print(ops)

check(start, target, ops)

After some tests with this code, it is now obvious that the result is a two phases operation. There is a first phase where items are deleted from the initial list, all but a increasing sequence of items all belonging to the target list (with repetition). Missing items are then added to list until the target list is built.

The temporary conclusion is that if we find an algorithm to determine the longest subsequence of items from the target list initially present in the first list, in the same order but not necessarily contiguous, then it gives the shortest path. This is a new and potentially simpler problem. This is probably what you meant above, but it is much clearer from the program's output.

It seems almost obvious that this problem can be reduced to the problem of the longest increasing subsequence

Gribouillis
  • 2,230
  • 1
  • 9
  • 14
  • That works well for most cases! Thanks a lot! Despite the long code, I'm still trying to understand it. So far I realize it doesn't give optimal solution for few cases like when `A` is `[2, 3, 4]` and `B` is `[2, 3, 4, 5, 6, 6, 7, 8]`. In this case, it will delete `2` first before inserting, which is not needed. – billz Dec 17 '16 at 10:36
  • Bugfix. There was a missing `- score` in the line `nsteps_2, steps_2 = ...`. I think the symmetric difference step can be avoided, working on it. – Gribouillis Dec 17 '16 at 11:40
  • I pushed a better version 0.2.0 without symmetric difference step. – Gribouillis Dec 17 '16 at 13:22
  • 1
    Tested with some edge cases I had, it worked amazing! thanks man. :D – billz Dec 17 '16 at 13:39
1

We can prove quite easily that the problem reduces to the longest non-decreasing subsequence by noting that if there were a collection of elements in A that did not merit deletion and was greater in number than the longest non-decreasing subsequence in A of elements also in B, then all the elements of this collection must exist in the same order in B, which means it's a longer non-decreasing subsequence, and contradicts our assertion. Additionally, any smaller collection in A that does not merit deletion, necessarily exists in B in the same order and is therefore part of the longest non-decreasing subsequence in A of elements also in B.

The algorithm then is reduced to the longest increasing subsequence problem, O(n log n + m):

(1) Find the longest non-decreasing subsequence of elements
    in A that have at least the same count in B.

(2) All other items in A are to be deleted and the 
    missing elements from B added.
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61