2

I have following two dimensional array:

seq_length = 5
x = np.array([[0, 2, 0, 4], [5,6,7,8]])
x_repeated = np.repeat(x, seq_length, axis=1)


[[0 0 0 0 0 2 2 2 2 2 0 0 0 0 0 4 4 4 4 4]
 [5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8]]

I want to shuffle x_repeated according to seq_length that all items of seq will be shuffled together.

For example, possible shuffle:

[[0 0 0 0 0 6 6 6 6 6 0 0 0 0 0 8 8 8 8 8]
 [5 5 5 5 5 2 2 2 2 2 7 7 7 7 7 4 4 4 4 4]]

Thanks

Night Walker
  • 20,638
  • 52
  • 151
  • 228

5 Answers5

0

You can do something like this:

import numpy as np

seq_length = 5
x = np.array([[0, 2, 0, 4], [5, 6, 7, 8]])

swaps = np.random.choice([False, True], size=4)

for swap_index, swap in enumerate(swaps):
    if swap:
        x[[0, 1], swap_index] = x[[1, 0], swap_index]

x_repeated = np.repeat(x, seq_length, axis=1)

You can also rely on the fact that True is non-zero, and replace the for with:

for swap_index in swaps.nonzero()[0]:
    x[[0, 1], swap_index] = x[[1, 0], swap_index]

The key is that I did the shuffling/swapping before the np.repeat call, which will be much more efficient compared to doing it afterwards (while meeting your requirement of sequences of values needing to be swapped). There is a 50% chance for each pair of sequences of the same values to be swapped.

Mario Ishac
  • 5,060
  • 3
  • 21
  • 52
0
import numpy as np


m = np.array([[0, 2, 0, 4], [5, 6, 7, 8]])

def np_shuffle(m, m_rows = len(m), m_cols = len(m[0]), n_duplicate = 5):
    # Flatten the numpy matrix
    m = m.flatten()
    # Randomize the flattened matrix m
    np.random.shuffle(m)
    # Duplicate elements
    m = np.repeat(m, n_duplicate, axis=0)
    # Return reshape numpy array
    return (np.reshape(m, (m_rows, n_duplicate*m_cols)))    

r = np_shuffle(m)

print(r)

# [[8 8 8 8 8 5 5 5 5 5 2 2 2 2 2 0 0 0 0 0]
#  [0 0 0 0 0 7 7 7 7 7 4 4 4 4 4 6 6 6 6 6]]
Laurent B.
  • 1,653
  • 1
  • 7
  • 16
0

Managed to solve it following way:

items_count = x.shape[-1]    
swap_flags = np.repeat(np.random.choice([0, 1], size=items_count), single_item_length)

gives:

[1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1]

for idx, flag in enumerate(swap_flags):
    if flag:
        x_repeated[0,idx], x_repeated[1,idx] = x_repeated[1,idx], x_repeated[0,idx]

Result:

[[5 5 5 5 5 6 6 6 6 6 0 0 0 0 0 8 8 8 8 8]
 [0 0 0 0 0 2 2 2 2 2 7 7 7 7 7 4 4 4 4 4]]

Still not so elegant numpy way.

Night Walker
  • 20,638
  • 52
  • 151
  • 228
0

Here's my attempt:

def randomize_blocks(arr):
  """ Shuffles an n-dimensional array given consecutive blocks of numbers.
  """
  groups = (np.diff(arr.ravel(), prepend=0) != 0).cumsum().reshape(arr.shape)
  u, c = np.unique(groups, return_counts=True)
  np.random.shuffle(u)
  o = np.argsort(u)
  return arr.ravel()[np.argsort(np.repeat(u, c[o]))].reshape(arr.shape)

Breakdown

First we get the groups

groups = (np.diff(arr.ravel(), prepend=0) != 0).cumsum().reshape(arr.shape)

array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7]])

Then, we get unique and counts for each group.

u, c = np.unique(groups, return_counts=True)

>>> print(u, c)
(array([6, 0, 3, 5, 2, 4, 7, 1]),
 array([5, 5, 5, 5, 5, 5, 5, 5]))

Finally, we shuffle our unique groups, reconstruct the array and use argsort to re-order the shuffled unique groups.

o = np.argsort(u)
arr.ravel()[np.argsort(np.repeat(u, c[o]))].reshape(arr.shape)

Example usage:

>>> randomize_blocks(arr)
array([[0, 0, 0, 0, 0, 4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 5, 5, 5, 5, 5],
       [7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 2, 2, 2, 2, 2, 6, 6, 6, 6, 6]])

>>> randomize_blocks(arr)
array([[6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 2, 2, 2, 2, 2]])
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • thanks for answer but you not answered asked. you did shuffling between rows and columns. I asked about only shuffle between rows. – Night Walker May 16 '20 at 05:42
0

Here is a solution that does is completely in-place and does not require allocating and generating random indices:

import numpy as np


def row_block_shuffle(a: np.ndarray, seq_len: int):
    cols = a.shape[1]
    rng = np.random.default_rng()
    for block in x_repeated.T.reshape(cols // seq_len, seq_length, -1).transpose(0, 2, 1):
        rng.shuffle(block)


if __name__ == "__main__":
    seq_length = 5
    x = np.array([[0, 2, 0, 4], [5, 6, 7, 8]])
    x_repeated = np.repeat(x, seq_length, axis=1)

    row_block_shuffle(x_repeated, seq_length)
    print(x_repeated)

Output:

[[5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8]
 [0 0 0 0 0 2 2 2 2 2 0 0 0 0 0 4 4 4 4 4]]

What I do is to create "blocks" that shares memory with the original array:

>>> x_repeated.T.reshape(cols // seq_len, seq_length, -1).transpose(0, 2, 1)
[[[0 0 0 0 0]
  [5 5 5 5 5]]

 [[2 2 2 2 2]
  [6 6 6 6 6]]

 [[0 0 0 0 0]
  [7 7 7 7 7]]

 [[4 4 4 4 4]
  [8 8 8 8 8]]]

Then I shuffle each "block", which will in turn shuffles the original array as well. I believe this is the most effective solution for large arrays as this solution is as in-place as it can be. This answer at least backs up my hypothesis:

https://stackoverflow.com/a/5044364/13091658

Also! The general problem you are facing is sorting "sliding window views" of your array, so if you would like to sort "windows" within your array that both moves horizontally and vertically you can for example see my previous answers for problems related to sliding windows here:

https://stackoverflow.com/a/67416335/13091658

https://stackoverflow.com/a/69924828/13091658

Naphat Amundsen
  • 1,519
  • 1
  • 6
  • 17