3

Has someone an idea how to solve the following problem?

Take the numbers 1,...,100000 and permute them in some way. At first you can make a swap of two numbers. Then you have to compute how many rounds it would take to collect numbers in ascending order. You have to collect numbers by every round by going left to right. In how many ways you can swap two numbers at the beginning to collect numbers in ascending order with minimum number of rounds?

For example, if numbers are from one to five and those at the beginning in order 3, 1, 5, 4, 2, then you can collect them in three rounds: On first round you collect 1, 2, on the second round 3, 4 and finally 5. But you can do one swap in three different ways to collect numbers in two rounds, namely

3, 4, 5, 1, 2
3, 1, 4, 5, 2
3, 1, 2, 4, 5

Five number sequence can be solved easily by brute force and I found an algorithm to collect 1000 numbers, but 100000 numbers needs maybe some kind of trick to compute fast how a specific swap at the beginning affects how many rounds it takes to collect numbers.

Another example:

Take 10 numbers in order [6, 1, 4, 10, 7, 2, 3, 9, 5, 8]. You can swap 4 and 9 to collect numbers in three rounds. But my code returns that there are 3 ways to make a swap. Where is my mistake?

from bisect import bisect_left, bisect_right
from functools import cmp_to_key

def longest_subsequence(seq, mode='strictly', order='increasing',
                        key=None, index=False):

  bisect = bisect_left if mode.startswith('strict') else bisect_right

  # compute keys for comparison just once
  rank = seq if key is None else map(key, seq)
  if order == 'decreasing':
    rank = map(cmp_to_key(lambda x,y: 1 if x<y else 0 if x==y else -1), rank)
  rank = list(rank)

  if not rank: return []

  lastoflength = [0] # end position of subsequence with given length
  predecessor = [None] # penultimate element of l.i.s. ending at given position

  for i in range(1, len(seq)):
    # seq[i] can extend a subsequence that ends with a lesser (or equal) element
    j = bisect([rank[k] for k in lastoflength], rank[i])
    # update existing subsequence of length j or extend the longest
    try: lastoflength[j] = i
    except: lastoflength.append(i)
    # remember element before seq[i] in the subsequence
    predecessor.append(lastoflength[j-1] if j > 0 else None)

  # trace indices [p^n(i), ..., p(p(i)), p(i), i], where n=len(lastoflength)-1
  def trace(i):
    if i is not None:
      yield from trace(predecessor[i])
      yield i
  indices = trace(lastoflength[-1])

  return list(indices) if index else [seq[i] for i in indices]


def computerounds(lines):
    roundnumber = 1
    for i in range(len(lines)-1):
        if lines[i] > lines[i + 1]:
            roundnumber += 1
    return roundnumber




if __name__ == '__main__':
    lines = [[3,1,5,4,2],[6, 1, 4, 10, 7, 2, 3, 9, 5, 8]]
    case = 1
    ways_to_change = len(longest_subsequence(lines[case], mode='strictly', order='decreasing',
                        key=None, index=False))
    print(len(lines[case]), computerounds(lines[case]), ways_to_change)
    # Should return 10 3 1

Effort 1:

I guess the hardest part is to find a permutation that guarantees you collect the numbers with minimum number of moves. I also heard that Dilworth's theorem tells me that the minimal decomposition into ascending subsequences is equal to the size of the maximal descending subsequence. https://artofproblemsolving.com/community/c163h1906044_an_algorithm_to_collect_numbers_in_ascending_order

Effort 2:

I tried to run the code by jferard and solve the problem for the case junar9.in found in https://www.ohjelmointiputka.net/tiedostot/junar.zip. The file contains fir the number of numbers in the first line and then the rest of the lines gives the numbers as in original order. It looks it takes too much memory. The output was in Linux Mint:

(base) jaakko@jaakko-Aspire-E1-572:~/.config/spyder-py3$ python3 temp.py 
Killed

Here is the code from temp.py

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""

import os.path
import requests
import zipfile
import warnings

def invert(L):
    M = [None] + [0 for _ in range(len(L))]
    for i, k in enumerate(L):
        M[k] = i
    return M


def perform_data(read_data):
    s = ""
    for i in range(len(read_data)):
        if read_data[i].isnumeric():
            s += read_data[i]
        else:
            s += " "
    s = s[:-1]
    s = s.split(" ")
    tmp = []
    for i in range(1, len(s)):
        if s[i] != '':
            tmp.append(int(s[i]))
    return tmp


def download_zipfile(url):
    if not os.path.isfile('/tmp/junar.zip'):
        with open('/tmp/junar.zip', 'wb') as out:
            out.write(requests.get(url).content)


def read_zipfile_item(filename):
    with zipfile.ZipFile('/tmp/junar.zip') as zip_file:
        with zip_file.open(filename) as f:
            return f.read().decode('utf8')

def generate_original_rounds(A):
    B =[0]*(len(A)-1)
    print(A)
    roundno = 1
    for i in range(1,len(A)):
        if A.index(i) < A.index(i+1):
            B[i-1] = roundno
        else:
            roundno += 1
            B[i-1] = roundno
            print(roundno)
    return B


def classify_moves(L):
    M = invert(L)
    N = len(L)
    good_moves, bad_moves = [None], [None]

    for k in range(1, N+1):
        good_move, bad_move = find_moves(k, L, M, N)
        good_moves.append(good_move)
        bad_moves.append(bad_move)

    return good_moves, bad_moves


def find_moves(k, L, M, N):
    def in_range(a, b):
        return set(L[j] for j in range(a, b))

    good_move = set()
    bad_move = set()
    if k == 1:
        if M[k+1] < M[k]:
            good_move |= in_range(0, M[k+1]+1)
        else: # M[k] < M[k+1]
            bad_move |= in_range(M[k+1], N)
    elif k == N:
        if M[k] < M[k-1]:
            good_move |= in_range(M[k-1], N)
        else: # M[k-1] < M[k]
            bad_move |= in_range(0, M[k-1]+1)
    elif M[k-1] < M[k+1]:
        if M[k] < M[k-1]:
            good_move |= in_range(M[k-1], M[k+1])
        elif M[k+1] < M[k]:
            good_move |= in_range(M[k-1]+1, M[k+1]+1)
        if M[k-1] < M[k]:
            bad_move |= in_range(0, M[k-1]+1)
        if M[k] < M[k+1]:
            bad_move |= in_range(M[k+1], N)
    else: # M[k+1] < M[k-1]
        if M[k+1] < M[k] < M[k-1]:
            good_move |= in_range(0, M[k+1]+1) | in_range(M[k-1], N)
        elif M[k] < M[k+1]:
            bad_move |= in_range(M[k+1], M[k-1])
        else: # M[k-1] < M[k]:
            bad_move |= in_range(M[k+1]+1, M[k-1]+1)

    return good_move, bad_move


def collate_moves_aux(L):
    good_moves, bad_moves = classify_moves(L)
    N = len(L)
    swaps_by_removed = {}
    for i in range(1, N+1):
        for j in range(i+1, N+1):
            removed = 0
            if j in good_moves[i]:
                if i in good_moves[j]:
                    removed = 2
                elif i not in bad_moves[j]:
                    removed = 1
            elif j not in bad_moves[i] and i in good_moves[j]:
                removed = 1
            if abs(i-j) <= 1: # don't count twice
                removed -= 1

            if removed > 0:
                swaps_by_removed.setdefault(removed, []).append((i,j))

    return swaps_by_removed


def collate_moves(L):
    swaps_by_removed = collate_moves_aux(L)

if __name__ == '__main__':
    # Testing
    url = 'https://www.ohjelmointiputka.net/tiedostot/junar.zip'
    download_zipfile(url=url)
    rawdata = read_zipfile_item('junar9.in')
    data = perform_data(rawdata)
    numbers = data
    A = collate_moves(numbers)
    print(A)

Idea 1: Is it helpful to compute permutation inversions somehow, http://mathworld.wolfram.com/PermutationInversion.html ? There are some algorithms to compute all permutation inversions in https://www.geeksforgeeks.org/counting-inversions/ but does this helps solve the problem? I think it is somehow related to compute the permutation inversions of the form (n,n+1).

Effort 3: I tried to apply the idea from jferard's answer. I think it computest wrong answer how many rounds it takes to collect numbers [6, 1, 4, 10, 7, 2, 3, 9, 5, 8]. It returns 4 but it takes five rounds, first 1, 2, 3, second 4, 5, third 6, 7, 8, fourth 9, and fifth 10.

def compute_original_rounds(M):
    c = 1
    for i in range(2, len(M)):
        if M[i] < M[i-1]:
            c += 1
    return c

if __name__ == '__main__':
    lines = [[3,1,5,4,2],[6, 1, 4, 10, 7, 2, 3, 9, 5, 8]]
    verygoods = 0
    lista = lines[1]
    best = 0
    drops = [0,0,0]
    for k in range(2,len(lista)):
        a = lista.index(k-1)<lista.index(k)
        b = lista.index(k)<lista.index(k+1)
        c = lista.index(k-1)<lista.index(k+1)
        if a and b:
            print("Zero inversions")
            drops[0] += 1
        if (not a and c) or (c and not b) or (b and not c) or (a and not c):
            print("One inversion")
            best = max(best,1)
            drops[1] += 1
        if not b and not a:
            print("Two inversions")
            best = max(best,2)
            drops[2] += 1
    ways = drops[2]
    if ways == 0:
        ways = drops[1]
        if ways == 0:
            ways = drops[0]
    original_rounds = compute_original_rounds(lista)
    print(original_rounds)
    print(len(lista),original_rounds - best, ways)
Jaakko Seppälä
  • 744
  • 2
  • 7
  • 21
  • Interesting project. Just a (cosmetic) issue of your coding style: why are you using `mode.startswith('strict')` instead of comparing with `strictly`, why at all such long switches instead of `strict`, `inc`, and `dec` which would make at least the annoying horizontal scrollbar of the code box disappear? :) – Wolf Sep 16 '19 at 08:29
  • Well, I found the code for the longest_subsequence from Stack Overflow and used it directly in my project. – Jaakko Seppälä Sep 16 '19 at 08:46
  • I see. In general, I'm not sure if your question addresses a mathematical or a programming problem. To get help with programming, you should try to make the code as clear as possible and to choose an example that is as small as possible to show the actual problem. I tried the code, it runs and the actual output is `10 5 3`, but it's not obvious to me why this must be wrong (without diving into the mathematical details - I'd expect I'm not the only one who hesitates in this point). – Wolf Sep 16 '19 at 09:22
  • Why did you choose a ["pretty general solution"](https://stackoverflow.com/a/38337443/2932052) instead of the [highest voted and accepted answer](https://stackoverflow.com/a/9832414/2932052)? – Wolf Sep 16 '19 at 10:09
  • I just tested it and it looks it worked. – Jaakko Seppälä Sep 16 '19 at 10:13
  • I just tested the [simpler code](https://stackoverflow.com/a/9832414/2932052) which returned the correct `[1 2 3 5 8]` for your 10-element sequence. – Wolf Sep 16 '19 at 10:22

2 Answers2

0

I don't see how the longest decreasing subsequence will give you the number of swaps. According to Dilworth's theorem, the longest antichain (subsequence of decreasing numbers) will give you the width of your list, that is the minimum number of chains (sequence of increasing numbers) you can have in partition of the list.

Note that Dilworth's theorem might not be applicable here because the chains (sequences of numbers in your case) should be ordered and the numbers have to be consecutives ([6, 1, 4, 10, 7, 2, 3, 9, 5, 8] is a counter-example: 3 Dilworth's chains but 5 rounds).

Here's an attempt. The solution is complicated and I hope that more straightforward answer exists, but I didn't find it. I cannot say for sure that it is bug free.

Compute the number of rounds

To compute the number of rounds in O(n), let's follow this method:

  • Start with 1, 2, 3, ... until you find a k having idx(k+1) < idx(k) where idx is the index in the original list (let's call this an inversion).
  • The first round is finished, and the second starts with k+1, k+2, ... until you find a l having idx(l+1) < idx(l),
  • and so on until the list is exhausted.

Hence the formula: number of rounds = 1 + |{k in L | pos(k+1)<pos(k)}|. Example with 3,1,5,4,2: idx(3)<idx(2) and idx(5)<idx(4), thus the number of rounds is 3.

In Python:

def invert(L):
    M = [None] + [0 for _ in range(len(L))]
    for i, k in enumerate(L):
        M[k] = i
    return M

def rounds(M):
    c = 1
    for i in range(2, len(M)):
        if M[i] < M[i-1]:
            c += 1
    return c

>>> rounds(invert([3, 1, 5, 4, 2]))
3
>>> rounds(invert([6, 1, 4, 10, 7, 2, 3, 9, 5, 8]))
5

Good and bad moves

That was the easy part. Now focus on a given k in L. You have six possibilities:

... k   ... k-1 ... k+1 ... : 1 inversion
... k-1 ... k   ... k+1 ... : 0 inversion
... k-1 ... k+1 ... k   ... : 1 inversion

... k   ... k+1 ... k-1 ... : 1 inversion
... k+1 ... k   ... k-1 ... : 2 inversions
... k+1 ... k-1 ... k   ... : 1 inversion

We call a "good move" a move from a situation with 1 inversion to a situation with 0 inversion, or from 2 inversions to 1 inversion. Conversely, a "bad move" is a move from a situation with 0 inversion to 1 inversion or 1 inversion to 2 inversions. When performing a swap, we wan't to avoid bad moves and do good moves. The best we can do is to do two good moves at once, reducing the number of rounds by 2.

First, we will compute, for every k, the good and the bad moves. We have to deal with edges cases (k == 1 or k == N), and with the two main possibilities (pos(k-1) < pos(k+1) and pos(k+1) < pos(k-1)). The swaps between k and k-1 or k+1 should be considered too. That gives the cumbrersome piece of code below:

def classify_moves(L):
    M = invert(L)
    N = len(L)
    good_moves, bad_moves = [None], [None]

    for k in range(1, N+1):
        good_move, bad_move = find_moves(k, L, M, N)
        good_moves.append(good_move)
        bad_moves.append(bad_move)

    return good_moves, bad_moves

def find_moves(k, L, M, N):
    def in_range(a, b):
        return set(L[j] for j in range(a, b))

    good_move = set()
    bad_move = set()
    if k == 1:
        if M[k+1] < M[k]:
            good_move |= in_range(0, M[k+1]+1)
        else: # M[k] < M[k+1]
            bad_move |= in_range(M[k+1], N)
    elif k == N:
        if M[k] < M[k-1]:
            good_move |= in_range(M[k-1], N)
        else: # M[k-1] < M[k]
            bad_move |= in_range(0, M[k-1]+1)
    elif M[k-1] < M[k+1]:
        if M[k] < M[k-1]:
            good_move |= in_range(M[k-1], M[k+1])
        elif M[k+1] < M[k]:
            good_move |= in_range(M[k-1]+1, M[k+1]+1)
        if M[k-1] < M[k]:
            bad_move |= in_range(0, M[k-1]+1)
        if M[k] < M[k+1]:
            bad_move |= in_range(M[k+1], N)
    else: # M[k+1] < M[k-1]
        if M[k+1] < M[k] < M[k-1]:
            good_move |= in_range(0, M[k+1]+1) | in_range(M[k-1], N)
        elif M[k] < M[k+1]:
            bad_move |= in_range(M[k+1], M[k-1])
        else: # M[k-1] < M[k]:
            bad_move |= in_range(M[k+1]+1, M[k-1]+1)

    return good_move, bad_move

>>> classify_moves([3, 1, 5, 4, 2])
([None, set(), set(), set(), {1, 5}, {2, 4}], [None, {2}, {1}, {4}, {3}, set()])

That means that, for instance, from 4 point of view, a swap with 1 or 5 are good, and a swap with 3 would be bad.

Choosing the swaps

Now, we have to collate all those good and bad moves into a list of acceptable swaps. the idea is simple: for every couple (i,j), if i is a good move from j and j a good move from i, then we can remove two rounds. If i is a good move from j and i is not a bad move from j, then we can remove one round. There is again some subtle tricks: 1) we have a list of swaps removing 1 round, but we throw away those swaps as soon as we find a swap removing 2 rounds (the best we can do). 2) when k is a good move from k+1 and k+1 a good move from k, we don't remove two rounds but only one (the good move was counted twice by the classify_moves function).

def collate_moves_aux(L):
    good_moves, bad_moves = classify_moves(L)
    N = len(L)
    swaps_by_removed = {}
    for i in range(1, N+1):
        for j in range(i+1, N+1):
            removed = 0
            if j in good_moves[i]:
                if i in good_moves[j]:
                    removed = 2
                elif i not in bad_moves[j]:
                    removed = 1
            elif j not in bad_moves[i] and i in good_moves[j]:
                removed = 1
            if abs(i-j) <= 1: # don't count twice
                removed -= 1

            if removed > 0:
                swaps_by_removed.setdefault(removed, []).append((i,j))

    return swaps_by_removed

def collate_moves(L):
    swaps_by_removed = collate_moves_aux(L)
    return max(swaps_by_removed.items(), key=lambda i: i[0])

>>> collate_moves_aux([3, 1, 5, 4, 2])
{1: [(1, 4), (2, 5), (4, 5)]}
>>> collate_moves([3, 1, 5, 4, 2])
(1, [(1, 4), (2, 5), (4, 5)])

And:

>>> collate_moves_aux([6, 1, 4, 10, 7, 2, 3, 9, 5, 8])
{1: [(3, 8), (5, 10), (8, 9), (9, 10)], 2: [(4, 9)]}    
>>> collate_moves([6, 1, 4, 10, 7, 2, 3, 9, 5, 8])
(2, [(4, 9)])

The complexity of the algorithm is O(N^2) amortized: invert is O(N), classify_moves is O(N^2) because find_moves is O(N) (build sets having a cardinal < N) and collate_moves is O(N^2) (amortized).

Hope someone's produce a simple version of this!!!

jferard
  • 7,835
  • 2
  • 22
  • 35
  • @גלעדברקן I think the OP wants to output `1,2,...,N` (consecutive *in the output* but not necessarily in the input) with the minimum number of backtracks. (See how I count rounds.) The counterexample is: [6, 1, 4, 10, 7, 2, 3, 9, 5, 8] gives, with Dilworth's theorem, three chains (as you wrote: [6 7 9] [1 2 3 5 8] [4 10]) but has five rounds. – jferard Sep 21 '19 at 07:02
  • Thanks for that. Actually what I need is two functions. Input is a given permutation. First function should return what is the minimal number of rounds that is needed to collect numbers if there is just one swap of two numbers at the beginning. The second should return how many ways there are to make the swap at the beginning such that one can collect the number with minimal number of rounds after such a swap. I try to understand how the code helps to compute those numbers. – Jaakko Seppälä Sep 22 '19 at 18:38
  • @JaakkoSeppälä You can easily build those functions: "the minimal number of rounds that is needed to collect numbers if there is just one swap of two numbers at the beginning" is the initial number of rounds minus `removed_rounds`, "how many ways there are to make the swap at the beginning" is `len(swaps)`. – jferard Sep 22 '19 at 18:40
  • I have troubles to implement that. I ran this by Linux and Python interpreter killed the program. Unfortunate it looks this approach takes too much memory. – Jaakko Seppälä Dec 02 '19 at 20:03
  • @JaakkoSeppälä Maybe the memory problem comes from the `good_moves` and `bad_moves` lists of lists. You could forget the `bad_moves` and try to find only the *first* "double good move", ie in the loop of `classify_moves`, check if you have a `i < k` so that `i in good_moves[k] and k in good_moves[i]`. If this is the case, then return `i, k`. – jferard Dec 02 '19 at 21:26
0

I would consider a structure like the following. Dashes show us where there is a switch between output sequence rounds. Candidates that need replacement are in the square brackets; these are the indexes immediately before and after the round switch. In parentheses, we have ranges of increasing index sequences that neighbour the candidates.

i: 1 2 3 4 5 6 7 8 9 10
A: 2 1 6 7 8 9 3 4 5 10

indexed, ordered output:
2 1 7 8 9 3 4 5 6 10
 -       -           (round switches)

[2,1](7..8)[9,3](4..10)

Now we insert the candidates, as well as the lower and upper bounds of each range into a binary search tree (or just a sorted array on which we can binary search), where each node also points to their position in the indexed, sorted output list. For each candidate index, we would like to test other indexes in the tree that fit. Once found, we can perform a linear search from the index to gather more possibilities. Notice that candidates that warrant replacing are the ones that if removed would offer an increasing sequence between their neighbours in the input.

2: find x ≤ 1: result 1
1: find 2 ≤ x ≤ 7: result 3
  (linear search is stopped by 9 and 4)
9: invalid candidate
3: invalid candidate

Swaps are therefore indexes (2,1) or (1,3).

Example 1 from the question:

i: 1 2 3 4 5
A: 3 1 5 4 2

indexed, ordered output:
2 5 1 4 3
   -   -  (round switches)

(2)[5,1][4,3]

candidates:
5: invalid candidate
1: invalid candidate
4: find 1 ≤ x ≤ 3: results 2, 3
3: find x ≥ 4: result 5

swaps: (4,2) (4,3) (3,5)

Example 2 from the question:

i: 1 2 3  4 5 6 7 8 9 10
A: 6 1 4 10 7 2 3 9 5 8

indexed, ordered output:
2 6 7 3 9 1 5 10 8 4
     -   -      - -  (round switches)

(2..6)[7,3][9,1](5)[10,8,4]

candidates:

7: invalid
3: find 7 ≤ x ≤ 9: result 8 (9 wouldn't reduce rounds)
9: invalid
1: invalid
10: find 5 ≤ x ≤ 8: results 7, 8
8: invalid
4: find x ≥ 8: results 8, 9 (10 wouldn't reduce rounds)

original rounds:
(1 2 3)(4 5)(6 7 8)(9)(10)

swaps:
(3,8) -> (1 2 3 4 5)(6 7 8)(9 10)
(10,7) -> (1 2 3)(4 5)(6 7 8 9)(10)
(10,8) -> (1 2 3)(4 5)(6 7 8 9)(10)
(4,8) -> (1 2 3)(4 5)(6 7 8)(9 10)
(4,9) -> (1 2 3)(4 5)(6 7 8)(9 10)
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
  • I see what you mean, but I think there is an issue with invalid candidates. For the list `[6, 1, 4, 10, 7, 2, 3, 9, 5, 8]`, I get the following swaps: `(4, 9)` (values) removes two rounds and `(3, 8), (5, 10), (8, 9), (9, 10)` (values) remove one round (see my edited answer). Hence I think the indices `8` and `10` shouldn't be invalid. – jferard Sep 22 '19 at 19:51
  • I was swapping values, **not indices**: base 6 1 4 10 7 2 3 9 5 8, swap (3,8) [indices: 7,10]: 6 1 4 10 7 2 8 9 5 3, rounds: (1 2 3)(4 5)(6 7 8 9)(10)::: swap (5,10) [i: 4,9]: 6 1 4 5 7 2 3 9 10 8, rounds: (1 2 3)(4 5)(6 7 8)(9 10) ::: swap (8,9) [i: 8,10]: 6 1 4 10 7 2 3 8 5 9, rounds: (1 2 3)(4 5)(6 7 8 9)(10) ::: swap (9,10) [i: 4,8],: 6 1 4 9 7 2 3 10 5 8, rounds: (1 2 3)(4 5)(6 7 8)(9 10). – jferard Sep 23 '19 at 10:02
  • @jferard ah, right. I missed those ones, but they seem to be working with the algorithm I proposed. Thanks! – גלעד ברקן Sep 23 '19 at 11:34