1

I need to find the minimum amount of swaps required to sort a string that only has letters X, Y and Z in random order and amount (not only adjacent). Any two chars can be swapped.

For example the string ZYXZYX will be sorted in 3 swaps: ZYXZYX -> XYXZYZ -> XXYZYZ -> XXYYZZ ZZXXYY - in 4 swaps, XXXX - in 0 swaps.

So far I have this solution, but the sorting does not sort the chars in the optimal way, so the result is not always the very minimum amount of swaps. Also, the solution should be O(nlogn).

def solve(s):
  n = len(s)
  newS = [*enumerate(s)] 
  sortedS = sorted(newS, key = lambda item:item[1])

  counter = 0
  vis = {v:False for v in range(n)} 
  print(newS)
  print(sortedS)

  for i in range(n):
    if vis[i] or sortedS[i][0] == i: 
      continue
    cycle_size = 0
    j = i 

    while not vis[j]: 
      vis[j] = True 
      j = sortedS[j][0] 
      cycle_size += 1
    
    if cycle_size > 0: 
      counter += (cycle_size - 1) 

  return counter
braflovsky
  • 121
  • 2
  • Right; the adjacent elements make that link inapplicable. Uniqueness is not an issue for the algorithm. There are other duplicates, however; the optimal algorithm is available from many places; a simple browser search finds quite a few, especially on other sites. – Prune Sep 22 '20 at 20:38

3 Answers3

1

First perform an O(n) pass through the array and count the X's, Y's, and Z's. Based on the counts, we can define three regions in the array: Rx, Ry, and Rz. Rx represents the range of indexes in the array where the X's should go. Likewise for Ry and Rz.

Then there are exactly 6 permutations that need to be considered:

Rx  Ry  Rz
X   Y   Z     no swaps needed
X   Z   Y     1 swap: YZ
Y   X   Z     1 swap: XY
Y   Z   X     2 swaps: XZ and XY
Z   X   Y     2 swaps: XZ and YZ
Z   Y   X     1 swap: XZ

So all you need is five more O(n) passes to fix each possible permutation. Start with the cases where 1 swap is needed. Then fix the 2 swap cases, if any remain.

For example, the pseudocode for finding and fixing the XZY permutation is:

y = Ry.start
z = Rz.start
while y <= Ry.end && z <= Rz.end
   if array[y] == 'Z' && array[z] == 'Y'
      array[y] <--> array[z]
      swapCount++
   if array[y] != 'Z'
      y++
   if array[z] != 'Y'
      z++

The running time for each permutation is O(n), and the overall running time is O(n).

Formal proof of correctness is left as an exercise for the reader. I'll only note that cases XZY, YXZ, and ZYX fix two elements at a cost of one swap (efficiency 2), whereas cases YZX and ZXY fix three elements at a cost of two swaps (efficiency 1.5). So finding and fixing the efficient cases first (and performing inefficient cases only as needed) should give the optimal answer.

user3386109
  • 34,287
  • 7
  • 49
  • 68
  • Could you explain a bit more how you got YZXZXY in Ry and ZYZXYX in Rz? I think I misunderstand what is meant by a range of indexes where a char should go... – braflovsky Sep 23 '20 at 11:07
  • 1
    @braflovsky The rows of the table answer the following question, "In the unsorted array, how many ways can you pick three different letters, one from each region?". The first row has the X in Rx, the Y in Ry, and the Z in Rz. So all three letters are in the correct region, and there's nothing to do. The second row of the table has the X in Rx, the Z in Ry, and the Y in Rz. Swapping the Y with the Z puts all of the letters into the correct region. – user3386109 Sep 23 '20 at 17:24
0

This can be solved using breadth-first-search, for example using Raymond Hettinger's puzzle solver (full code of the solver at the bottom of the page):

class SwapXYZ(Puzzle):    
    def __init__(self, pos):
        self.pos = [x for x in pos]
        self.goal = sorted(pos)
    
    def __repr__(self):
        return repr(''.join(self.pos))
    
    def isgoal(self):
        return self.pos == self.goal
    
    def __iter__(self):
        for i in range(len(self.pos)):
            for j in range(i+1, len(self.pos)):
                move = self.pos[:]
                temp = move[i]
                move[i] = move[j]
                move[j] = temp
                yield SwapXYZ(''.join(move))

SwapXYZ("ZYXZYX").solve()
# ['ZYXZYX', 'XYXZYZ', 'XXYZYZ', 'XXYYZZ']
Jan Christoph Terasa
  • 5,781
  • 24
  • 34
0

IMVHO number of swaps for sorting such string is zero. We get counts of X, Y and Z (nx, ny, nz). Then we fill first nx elements with X, next ny elements with 'Y' and the rest with 'Z'. Complexity is o(n)

def sortxyz(a):
    nx = ny = nz = 0
    for i in range(len(a)):
        if a[i] == 'X':
            nx += 1
        elif a[i] == 'Y':
            ny += 1
        else:
            nz += 1

    return ''.join(['X'] * nx + ['Y'] *  ny + ['Z'] * nz)

print(sortxyz('YXXZXZYYX'))
XXXXYYYZZ

For more general case when elements of list can take m values complexity will be o(m * n).

Yuri Ginsburg
  • 2,302
  • 2
  • 13
  • 16
  • 1
    Or just `''.join(c * a.count(c) for c in 'XYZ')`. – superb rain Sep 23 '20 at 21:50
  • 1
    And the complexity isn't o(n) but O(n). – superb rain Sep 23 '20 at 21:51
  • @superbrain I did not use `count` in order to iterate over the list once, not three times. And you are right it is O(n). – Yuri Ginsburg Sep 23 '20 at 23:11
  • Your single iteration is much slower than the three `count` iterations combined, though. Especially when the input string is long. Like several hundred times slower. – superb rain Sep 23 '20 at 23:22
  • Hmm, apparently I confused some numbers, it becomes "only" about a hundred times slower, not several hundred. – superb rain Sep 23 '20 at 23:34
  • You still are somewhat exaggerating the performance of fastsearch. I it 30 to 35 times faster, not hundred. Also, I said "to iterate over the list once, not three times", not a single about speed. – Yuri Ginsburg Sep 24 '20 at 02:13
  • How did you measure that, and with what string? I used `'YXXZXZYYX' * 10**6`. I know you talked about number of iterations. What's your point? – superb rain Sep 24 '20 at 08:29
  • [Here's how I did it](https://repl.it/repls/TurquoiseTemptingMacroinstruction#main.py), cleaned up and with sample results showing a factor over 100 (in one case 256, so maybe earlier I didn't confuse numbers but it was a result of repl.it's instability). – superb rain Sep 24 '20 at 10:25