3

I have this issue where I would like to shuffle a list, but only do so slightly. Say, I want only a small number of elements to be moved. Is there a simple way to get this done?

Right now the best I can think of is building my own method be hand, but is there some way to use the random library to do this for me?

Math chiller
  • 4,123
  • 6
  • 28
  • 44

6 Answers6

3

One interpretation is to strongly or weakly retain the initial ordering. The weakest retention would be a completely random shuffle, the strongest would be to not deviate from the initial ordering.

This can be accomplished by creating a tuple consisting of the original index scaled by a constant, plus some randomness, followed by the value. Sort the tuples, then iterate through to recover the original values in their new order. If the scale factor for the index is near zero, the new order will be random. If it's near 1, things will tend to strongly but not perfectly retain their original ordering. If it's larger, the result becomes unlikely to be shuffled.

import random

orderliness = 0.75

def tuplify(x, y):
    return (orderliness * y + random.gauss(0,1), x)

values = [i+1 for i in range(20)]
print(values)
pairs = list(map(tuplify, values, range(len(values))))
pairs.sort()
partially_ordered_values = [p[1] for p in pairs]
print(partially_ordered_values)

This produces, for example:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]  # initial ordering
[2, 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 11, 14, 17, 16, 15, 18, 19, 20]  # weakly shuffled

Tendency to shuffle would be determined by the relative magnitudes of orderliness and the standard deviation in random.gauss().

pjs
  • 18,696
  • 4
  • 27
  • 56
3

to show what some of these solutions are doing I find it helps to run a monte-carlo algorithm many times and look at the distribution

first a tidied up version of @meta4's solution as it was the most fleshed out:

from random import randrange

def partial_shuffle(l, factor=5):
    n = len(l)
    for _ in range(factor):
        a, b = randrange(n), randrange(n)
        l[b], l[a] = l[a], l[b]

which we can run many times by doing:

import numpy as np

n = 8
orig = list(range(n))
occur = np.zeros((n, n), int)

for _ in range(100000):
    x = orig[:]
    partial_shuffle(x,1)
    occur[orig,x] += 1

if we print out the occurrences table as percentages we get:

[[33.5  9.6  9.5  9.4  9.4  9.6  9.5  9.5]
 [ 9.6 33.2  9.7  9.5  9.6  9.6  9.4  9.4]
 [ 9.5  9.6 33.2  9.5  9.6  9.5  9.6  9.5]
 [ 9.5  9.3  9.6 33.4  9.5  9.5  9.5  9.6]
 [ 9.4  9.6  9.4  9.6 33.3  9.5  9.7  9.5]
 [ 9.6  9.5  9.6  9.6  9.4 33.3  9.5  9.6]
 [ 9.4  9.7  9.5  9.5  9.5  9.6 33.2  9.7]
 [ 9.5  9.5  9.6  9.5  9.7  9.5  9.6 33.2]]

each row represents the probability of the item moving to the column. in this case (when n=8) the algorithm will tend to leave elements where they were ~33% of the time, and then pick the remainder uniformly

I can then run (a tidied up) version of pjs's code with:

from random import gauss

orderliness = 2

occur = np.zeros((n, n), int)

for _ in range(100000):
    x = sorted(orig, key=lambda i: gauss(i * orderliness, 1))
    occur[orig,x] += 1

which gives very different output:

[[91.9  7.9  0.1  0.   0.   0.   0.   0. ]
 [ 7.9 84.1  7.8  0.1  0.   0.   0.   0. ]
 [ 0.1  7.8 84.1  7.9  0.1  0.   0.   0. ]
 [ 0.   0.1  7.9 84.1  7.7  0.1  0.   0. ]
 [ 0.   0.   0.1  7.7 84.2  7.8  0.1  0. ]
 [ 0.   0.   0.   0.1  7.9 84.2  7.7  0.1]
 [ 0.   0.   0.   0.   0.1  7.7 84.2  7.9]
 [ 0.   0.   0.   0.   0.   0.1  7.9 91.9]]

i.e. items tend to remain close to where they started

this sort of table is great at detecting bias in the distribution, which there doesn't seem to be evidence of above. but, for example, with Artyom's solution (shuffle(x, lambda: random() / 5)) gives the following:

[[  0.   37.4   0.    0.    0.   16.7  23.8  22.1]
 [  0.    0.  100.    0.    0.    0.    0.    0. ]
 [  0.    0.    0.  100.    0.    0.    0.    0. ]
 [  0.    0.    0.    0.  100.    0.    0.    0. ]
 [  1.7   0.    0.    0.    0.   83.3  11.9   3. ]
 [  9.    7.4   0.    0.    0.    0.   64.2  19.4]
 [ 26.7  17.9   0.    0.    0.    0.    0.   55.5]
 [ 62.6  37.4   0.    0.    0.    0.    0.    0. ]]

which probably isn't what the OP wanted. the high probability off diagonal represents rotating the array by one element

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
  • 1
    Interesting, but the behavior in my solution depends heavily on `orderliness` coefficient. In fact, if you make it negative, it starts having a propensity to reverse the data, while if it's zero the results will be indistinguishable from a random shuffle. In other words, you can tune the algorithm to whatever degree of disorder you want relative to the starting positions. – pjs Jun 18 '20 at 23:03
  • 1
    That property of having randomness while maintaining an ordering propensity makes it fun to use for benchmarking sorting algorithms where you want to study the impact of order, reverse order, or randomness on the effectiveness of the sort. – pjs Jun 18 '20 at 23:09
  • 1
    @pjs have changed the code to make your `orderliness` parameter more explicit. note that other distributions might be interesting as well. e.g. Cauchy would sometimes put elements a long way away, which might be useful/interesting – Sam Mason Jun 19 '20 at 09:57
1
from random import randint

def partial_shuffle(l, factor=5):
    for _ in range(factor):
        a, b = randint(0, len(l)), randint(0, len(l)) # pick two random indexes
        l[b], l[a] = l[a], l[b] # swap the values at those indexes
    return l

This is the partial Fisher-Yates Shuffle @rossum recomended.

''.join(partial_shuffle(list('abcdefghijklmnopqrstuvwxyz'), 2))

This example yields "abcdefnhijklmgopqrsyuvwxtz", from one run, but will yield something else for a different run.

meta4
  • 448
  • 4
  • 9
0

Use the shuffle method of Python's random module. It takes a list and a random in arguments. Where the random is a function which should return float number from 0.0 to 1.0. It helps shuffle to shuffle the given list in a custom way. You can overwrite that function.

import random

def rand():
    return random.random() / 5

arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]
random.shuffle(arr, random=rand)
# OUTPUT: [9, 3, 4, 5, 6, 7, 8, 1, 2]
Artyom Vancyan
  • 5,029
  • 3
  • 12
  • 34
  • Could you explain how the second parameter is used? – Math chiller Jun 17 '20 at 19:55
  • `shuffle` function iterates on list and calls random function, if function will return number less than 0.5 then current number from list wouldn't change position – Artyom Vancyan Jun 17 '20 at 20:01
  • for that reason I wrote function which usually returns numbers less than 0.5 – Artyom Vancyan Jun 17 '20 at 20:08
  • Actually the `random` argument is more like a seed for the shuffling. `random.shuffle(arr, lambda: 0.2)` yields `[5, 3, 4, 1, 6, 7, 8, 9, 2]` always. – Jan Joswig Jun 17 '20 at 20:43
  • 1
    the source of [random.shuffle](https://github.com/python/cpython/blob/e8056180a13b6755e4e3e5505b7bf03f79da29fb/Lib/random.py#L310) suggests it's not designed to be used like this. biasing it to smaller values will cause it to approach a list that has been rotated by one element – Sam Mason Jun 18 '20 at 22:11
0

One could also interpret slightly shuffled in the sense that there is a probability for shuffling elements at every step of the Fisher-Yates algorithm @rossum and @meta4 mentioned (instead of having a fixed number of elements shuffled).

def conditional_fy(l, p):
    """Shuffle elements of a list with a given probability

    Args:
        l: list
        p: shuffle probability
            (0: elements are never shuffled,
             1: elements are always shuffled)

    """
    assert 0 <= p <= 1

    for i in range(len(l) - 1, 0, -1):
        shuffle = random.random()
        if shuffle < p:
            j = random.randint(0, i - 1)
            l[i], l[j] = l[j], l[i]
Jan Joswig
  • 693
  • 5
  • 20
-1

Use a Fisher-Yates shuffle, but do not run it for the entire list. Just run one step for each entry you want moved: 5 steps to move 5 entries, 10 steps to move 10 entries.

rossum
  • 15,344
  • 1
  • 24
  • 38
  • That doesn't seem to me like something that would work, care to elaborate? – Math chiller Jun 17 '20 at 19:12
  • The normal F-Y shuffle loops over the entire list from one end to the other, shuffling all the entries. I am suggesting stopping the loop after a few iterations so only a few list entries are moved. Rather than looping over 0 .. n-1, loop over 0 .. 5 to move just 5 entries. – rossum Jun 17 '20 at 19:25
  • Yeah, that idea isn't fleshed out at all, and as far as I can tell won't work. The whole idea in F-Y depends essentially on going through the entire list. – Math chiller Jun 17 '20 at 19:34
  • Not all items would be equally likely to be displaced. – pjs Jun 17 '20 at 20:48
  • 1
    @pjs Then pick the positions to shuffle randomly rather than serially. – rossum Jun 17 '20 at 21:31