1

I have a numpy array of the format

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

Each column represents a data channel, and I need to shuffle the contents of each column within that column independently of the other channels. I understand that numpy.random.shuffle only shuffles along the first axis of the array ie. shuffles the order of rows within the array. What is the best way to carry out an independent shuffle within each column?

Divakar
  • 218,885
  • 19
  • 262
  • 358
eniem
  • 55
  • 1
  • 8
  • A simple for loop and treating each column as a normal list may be easiest https://stackoverflow.com/a/976918/7304372 – dǝɥɔS ʇoıןןƎ Mar 22 '18 at 10:45
  • Use @unutbu's [numpy solution](https://stackoverflow.com/a/36273313/9209546) with `axis=0`. Marked as duplicate. – jpp Mar 22 '18 at 10:50
  • `arr[:, n] = numpy.random.permutation(arr[:, n])` where `n` is the column index. – dROOOze Mar 22 '18 at 10:52
  • @droooze, this may work. But I suggest you post on the duplicate answer. – jpp Mar 22 '18 at 10:53
  • 1
    @jpp Mine looks closer to @ unutbu's one. So, I think we are covered there. Good find there. – Divakar Mar 22 '18 at 10:55
  • @jpp done, I guess the question warrants another answer which is more direct. – dROOOze Mar 22 '18 at 11:11
  • 1
    @droooze, I upvoted it as a valid answer. This benefits everyone. In the long run, good answers are more likely to get views/upvotes on a canonical question/answer page than on dups like this. – jpp Mar 22 '18 at 11:12
  • I think I will undelete my solution here, as the linked dup target doesn't show how to translate for per column basis as asked here. Won't vote to re-open though. – Divakar Mar 22 '18 at 11:17

1 Answers1

3

We could generate unique row indices for each column and index into the the input array with advanced-indexing. To generate the unique indices, we would use random float generation + sort trick, thus giving us a vectorized solution, like so -

idx = np.random.rand(*a.shape).argsort(0)
out = a[idx, np.arange(a.shape[1])]

Generic version

We could generalize it to cover generic n-dim arrays and along generic axes with np.take_along_axis and end up something as listed in this post.

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Perfect, this does exactly what I needed to do. Would not have thought of this on my own. – eniem Mar 24 '18 at 14:14