1

Problem
Assume you have a structured np.array as such:

first_array = np.array([1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9])

and you would like to create a new np.array of same size, but shuffled.

E.g.

second_array = np.random.shuffle(first_array)

second_array 
# np.array([3, 2, 9, 5, 6, 1, 1, 6, 9, 7, 8, 5, 2, 7, 8, 3, 4, 4])
                      

So far, so good. However, random shuffling leads to some duplicates being very close to each other, which is something I'm trying to avoid.

Question
How do I shuffle this array so that the integer order is pseudo-random but so that each duplicate has high probability to be very distant to each other? Is there a more specific term for this problem?

Novorodnas
  • 85
  • 6
  • I remember that this was a common problem when shuffling music playlists; users complained that "the shuffle is not good enough" because there would often be two consecutive titles from a same artist. As a result, Spotify and pretty much every playlist software are now using "smoothed" shuffle algorithms that are not uniformly random, and avoid adjacent similar titles. – Stef Jan 04 '22 at 10:58
  • Yeah I also remember Spotify had that problem. Good hint, thanks. – Novorodnas Jan 04 '22 at 11:18
  • Similar problems: [Function to make a list as unsorted as possible](https://stackoverflow.com/questions/70508442/function-to-make-a-list-as-unsorted-as-possible/70510656); [A Stack Overflow user's curious problem of maximising unsortedness](https://or.stackexchange.com/questions/7501/a-stack-overflow-users-curious-problem-of-maximising-unsortedness); – Stef Jan 04 '22 at 14:34
  • [Randomising a list whilst ensuring duplicates are not sequential](https://stackoverflow.com/questions/29797136/randomising-a-list-whilst-ensuring-duplicates-are-not-sequential); [Disperse Duplicates in an Array](https://stackoverflow.com/questions/17899312/disperse-duplicates-in-an-array); [An algorithm that spreads similar items across a list](https://softwareengineering.stackexchange.com/questions/262557/an-algorithm-that-spreads-similar-items-across-a-list) – Stef Jan 04 '22 at 14:35

1 Answers1

2

This is more of an algorithmic problem than numpy only. A naive approach can be to split the array with the minimum target spacing (spacing_condition), that numbers should be at least that far apart.

import numpy as np
first_array = np.array([1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9])
spacing_condition = 3
subarrays = np.split(first_array, spacing_condition)

Next step is to choose sequentially from subarrays in order, this would guarantee the spacing condition, and remove the choice from that subarray along.

However, this last two step, naive loops will be slow for large arrays. Following a naive implementation, seed is there only to reproduce.

np.random.seed(42)

def choose_over_spacing(subarrays):
    choices = []
    new_subarrays_ = []
    subarray_indices = np.arange(len(subarrays[0]))
    for subarray in subarrays:
        index_to_choose = np.random.choice(subarray_indices, 1)[0]
        number_choice = subarray[index_to_choose]
        choices.append(number_choice)
        new_subarray = np.delete(subarray, index_to_choose)
        new_subarrays_.append(new_subarray)
    return choices, new_subarrays_

all_choices = []
for _ in np.arange(len(subarrays[0])):
    choices, subarrays =  choose_over_spacing(subarrays)
    all_choices = all_choices + choices

Inspecting the resulting, we see that we guarantee that duplicated numbers are at least 3 numbers apart, as we condition with spacing_condition, one could choose different spacing condition as long as initial split works.

[2, 6, 8, 3, 6, 7, 2, 5, 9, 3, 4, 9, 1, 4, 8, 1, 5, 7]
patagonicus
  • 203
  • 2
  • 10
  • Thank you very much for your support. I was thinking about solving it similarily and this method works great. I am now trying to figure out how to do the same but with 2D arrays. I stupidly forgot to mention that this is my goal. – Novorodnas Jun 21 '21 at 12:36
  • @Novorodnas For 2D, you would need to define spacing condition in the other dimension too, similarly. – patagonicus Jun 21 '21 at 15:37
  • I'll use the euclidean distance to do that.. thanks a lot! – Novorodnas Jun 22 '21 at 08:49