3

Problem:

I would like to randomly shuffle a list, but using a seed value s so that, when changing s only slightly, the shuffle's output also only changes slightly.

Details:

I have a sorted list of elements, for example:

[0,1,3,5,7]

This list should be shuffled multiple times, each time using a seed value s. When two seed values s1 and s2 = s1 + d lie close to each other, then the shuffled lists should also be "similar".

By "similar", I mean that elements in the new shuffled list are either the same as they were when using the original seed value s1, or they are replaced only by values which are close to them in the original list (for example their direct neighbors in the original list).

Edit: The output should still be deterministic, i.e. using the same seed s1 when shuffling the same input list should result in the same shuffled list.

Example for the above list (note how adding the small value d only perturbs the list slightly, i.e. many values stay the same and if they change, they are usually replaced by their neighbors in the original list. As the offset increases, the "difference" between the lists may increase further, and values beyond the neighbors may be selected as well):

Seed: Output:
s [5,0,1,7,3]
s + d [5,0,3,7,1]
s + 2d [7,0,3,5,1]
s + 3d [7,0,1,3,5]

Is there an algorithm for this? Is there a common name for this problem (or other search terms I could use)?

Edit2: The output should also not depend on what the original seed was, i.e. if s = s3 + d = s4 - 0.5*d then the shuffle result should be the same whether I shuffle using s, s3 + d or s4 - 0.5*d as a seed. In other words, I only pass the final seed to the algorithm, not the original seed s3 or s4. The reason is that I want to interpolate between the seeds and the result should be an "interpolation" between the list's permutations.

Ideas I had thus far:

I could use simplex/perlin noise to sample elements: I generate a number between 0 and 1 and then use it to choose the next element from the list (0 meaning I choose the first element, 1 meaning I choose the last). Since these noise types can be "smooth", adding d will change the random value only slightly, meaning it will usually select elements in proximity of the ones it picked before adding d. However, I have a very hard time choosing a "good" frequency for the noise (high frequencies will remove the smoothness I need, low frequencies will result in no change when adding d). Also, once the list shrinks as I pick elements, the effect of d will decrease, because the range 0 to 1 will be mapped to fewer elements - and I don't really know if this is a problem yet...?

Germanunkol
  • 231
  • 1
  • 3
  • 13
  • 1
    Shuffle the list once. From then on don't shuffle the whole list, just make d swaps between pairs of list elements. If d=1 then swap 1 random pair. If d=3 then swap 3 random pairs and so on. – rossum May 24 '21 at 08:05
  • maybe something in https://stackoverflow.com/q/62436299/1358308 helps? could set variance based on difference – Sam Mason May 24 '21 at 10:14
  • @rossum On first thought, this seems to solve the problem as posed. However, I don't want the result to depend on which original seed I started from. If $s1 + 2d = s2$, then the shuffle result using $s1 + d$ should be the same as the one for $s2 - d$. I guess I'll have to add this to the question, I guess it's not clear from the way I posed it. – Germanunkol May 25 '21 at 21:07
  • I agree with @rossum. Pick a large-ish N. To generate permutation k*N for any k, seed the rng with N*k and use the normal shuffle algorithm. For any other permutation N*k + i ,0 < i < N, first compute permutation N*k, then slightly perturb it i times with something like Sam Mason suggested. You should set N roughly to the max list length because N "slight" perturbations of the shuffle will be nearly equivalent to another shuffle. – Gene May 28 '21 at 01:56

3 Answers3

3

Generate and set aside a random shuffle R of the list. Then, every time the generator is invoked with seed s:

  1. Let s = s % factorial(N) where N is the length of the list (or just require that s should be between 0 and N!-1).
  2. Find the factorial representation of s
  3. Convert the factorial representation to a permutation p
  4. Apply p to R and return the result.

The construction is such that the permutations p(s) and p(s+1) are adjacent in lexicographic order. The steps can be implemented efficiently.

You could use the same idea with other orderings of the set of permutations, which minimize changes even more, such as the Heap ordering or the Steinhaus-Johnson-Trotter ordering, but I don't know whether step 3. can be implemented efficiently in those cases.

Vincenzo
  • 201
  • 1
  • 5
  • 2
    Doesn't the lexicographic closeness take a hit regularly? Say `p(s) = 34210` where N = 5, then `p(s +1) = 40123` which is very different. – beria Jun 01 '21 at 12:28
  • As I said, you can use the same idea with a different ordering of the permutations. – Vincenzo Jun 02 '21 at 14:36
2

If you can bear changing your definition of "similarity" to become defined as the number of swaps between elements of a permutation then we can use the Trotter-Johnson permutation ordering where successive permutations differ by only one swap of two items in the perms.

There are algorithms to associate consecutive integer ranks with successive perms of the ordering and further algorithms that can generate the perm from the rank and vice-versa. usinf the seed as the rank value then two seeds that differ by n would need n swaps of items to go between their respective perms.

A lot of searching gives mention of the book:

D.L. Kreher and D.R. Stinson, Combinatorial Algorithms: Generation, Enumeration, and Search, CRC Press, 1999.

I have turned it into the following Python:

# -*- coding: utf-8 -*-
"""
Created on Thu Jun  3 08:44:56 2021

@author: Paddy3118
"""

from typing import List

Perm = List[int]

_fact = [1]     # factorials cache


def print_perm(T: Perm) -> None:
    print(T)

def tj_unrank(n: int, r: int) -> Perm:
    "Returns the r-ranked Trotter-Johnson permutation of integers 0..n-1"
    global _fact

    for i in range(len(_fact), n+2):    # Extend factorial cache if necessary.
        _fact.append(_fact[i - 1] * i)

    pi: Perm = [0] * (n+2)
    pi[1] = 1
    r2 = 0
    for j in range(2, n+1):
        r1 = (r * _fact[j]) // _fact[n]
        k = r1 - j*r2
        if ((r2 % 2) == 0):
            for i in range(j-1, j - k - 1, -1):
                pi[i+1] = pi[i]
            pi[j-k] = j
        else:
            for i in range(j - 1, k, -1):
                pi[i+1] = pi[i]
            pi[k + 1] = j
        r2 = r1

    return [i - 1 for i in pi[1:-1]]

def tj_rank(n: int, p: Perm) -> int:
    "Returns the ranking of the Trotter-Johnson permutation p, of integers 0..n-1"
    assert set(p) == set(range(n)), f"Perm {p} not a perm of 0..{n-1}."

    pi = [0] + [i+1 for i in p] + [0]
    r = 0
    for j in range(2, n + 1):
        i = k = 1
        while pi[i] != j:
            if (pi[i] < j):
                k += 1
            i += 1
        if ((r % 2) == 0 ):
            r = j*r+j-k
        else:
            r = j*r+k-1

    return r

def tj_parity(p: Perm) -> int:
    "Returns the 0/1 parity of the Trotter-Johnson permutation p, of integers 0..n-1"
    n = len(p)
    assert set(p) == set(range(n)), f"Perm {p} not a perm of 0..{n-1}."

    pi = [0] + [i+1 for i in p] + [0]
    a, c = [0] * (n + 1), 0
    for j in range(1, n+1):
        if a[j] == 0:
            c += 1
            a[j] = 1
            i = j
            while ( pi[i] != j ):
                i = pi[i]
                a[i] = 1

    return (n-c) % 2


if __name__ == '__main__':
    from sys import argv
    import re

    if not (len(argv) == 2 and re.match(r'\d+', argv[1])):
        raise SystemExit('ERROR. Call needs an integer >0 argument')
    n = int(argv[1])
    if n <=0:
        raise SystemExit('ERROR. Call needs one integer >0 argument')

    print(f"Testing rank/unrank n={n}.\n");
    for i in range(len(_fact), n+2):    # Extend factorial cache if necessary.
        _fact.append(_fact[i - 1] * i)

    for r in range(_fact[n]):
        p = tj_unrank(n, r)
        rank = tj_rank(n, p)
        parity = tj_parity(p)
        print(f"  Rank: {r:4} to perm: {p}, parity: {parity} to rank: {rank}")

Sample output for n == 4

Testing rank/unrank n=4.

  Rank:    0 to perm: [0, 1, 2, 3], parity: 0 to rank: 0
  Rank:    1 to perm: [0, 1, 3, 2], parity: 1 to rank: 1
  Rank:    2 to perm: [0, 3, 1, 2], parity: 0 to rank: 2
  Rank:    3 to perm: [3, 0, 1, 2], parity: 1 to rank: 3
  Rank:    4 to perm: [3, 0, 2, 1], parity: 0 to rank: 4
  Rank:    5 to perm: [0, 3, 2, 1], parity: 1 to rank: 5
  Rank:    6 to perm: [0, 2, 3, 1], parity: 0 to rank: 6
  Rank:    7 to perm: [0, 2, 1, 3], parity: 1 to rank: 7
  Rank:    8 to perm: [2, 0, 1, 3], parity: 0 to rank: 8
  Rank:    9 to perm: [2, 0, 3, 1], parity: 1 to rank: 9
  Rank:   10 to perm: [2, 3, 0, 1], parity: 0 to rank: 10
  Rank:   11 to perm: [3, 2, 0, 1], parity: 1 to rank: 11
  Rank:   12 to perm: [3, 2, 1, 0], parity: 0 to rank: 12
  Rank:   13 to perm: [2, 3, 1, 0], parity: 1 to rank: 13
  Rank:   14 to perm: [2, 1, 3, 0], parity: 0 to rank: 14
  Rank:   15 to perm: [2, 1, 0, 3], parity: 1 to rank: 15
  Rank:   16 to perm: [1, 2, 0, 3], parity: 0 to rank: 16
  Rank:   17 to perm: [1, 2, 3, 0], parity: 1 to rank: 17
  Rank:   18 to perm: [1, 3, 2, 0], parity: 0 to rank: 18
  Rank:   19 to perm: [3, 1, 2, 0], parity: 1 to rank: 19
  Rank:   20 to perm: [3, 1, 0, 2], parity: 0 to rank: 20
  Rank:   21 to perm: [1, 3, 0, 2], parity: 1 to rank: 21
  Rank:   22 to perm: [1, 0, 3, 2], parity: 0 to rank: 22
  Rank:   23 to perm: [1, 0, 2, 3], parity: 1 to rank: 23

(Notice how successive perms differ by one swap of two items in the perms above).

Paddy3118
  • 4,704
  • 27
  • 38
  • Note that factorials can quickly exceed what pythons random.sample() function can handle, random.randrange will work with larger factorials such as the 250 decimal digits of 144! – Paddy3118 Jun 03 '21 at 12:54
  • Note that how you map your objects to the ints of the perm, in addition to the seed used as the rank, determine the eventual permutation of your objects. – Paddy3118 Jun 04 '21 at 06:38
1

It is kinda difficult to give a metric to "similar" in this case. [Ex: Apple had a problem with their first shuffle implementation - people did not feel like the songs were mixed randomly enough].

I would reformulate the problem in a different way "How we can modify some simple shuffling algorithm so we achieve the desired result?".

Here is my idea:

  1. We will need a baseline seed s - (something large, comparable to list sizes that are generally used in your case/data, this value could be either hardcoded or passed as a parameter, but should stay fixed if we vary s1)
  2. d = abs(s1 - s) - (how far is s1 from s)
  3. do a default shuffle with s as seed - (any function from random library of your language choice will suffice.)
  4. Now iterate over the list and swap any element with some probability p (0,1) with any other element that is at distance d from your position.
    Ex: d = 2
    1 2 3 4 5 6 7
    element 5 can be swapped with 3,4,6 or 7.
    Basicaly element at pos i will be swaped with any element(could be also chosen randomly) in this range [i-d,i+d] with probability p.
  5. Manipulate p and s until your "similarity" score/feel is satisfied.
Dan Butmalai
  • 100
  • 1
  • 8