3

I have a number of data files, each containing a large amount of data points.

After loading the file with numpy, I get a numpy array:

f=np.loadtxt("...-1.txt")

How do I randomly select a length of x, but the order of numbers should not be changed?

For example:

f = [1,5,3,7,4,8]

if I wanted to select a random length of 3 data points, the output should be:

  1. 1,5,3, or
  2. 3,7,4, or
  3. 5,3,7, etc.
Jongware
  • 22,200
  • 8
  • 54
  • 100
msap
  • 43
  • 2
  • 1
    Possible duplicate of [Get random sample from list while maintaining ordering of items?](https://stackoverflow.com/questions/6482889/get-random-sample-from-list-while-maintaining-ordering-of-items) – dgomzi Jul 06 '18 at 08:36

2 Answers2

2

Pure logic will get you there.

For a list f and a max length x, the valid starting points of your random slices are limited to 0, len(f)-x:

     0 1 2 3
f = [1,5,3,7,4,8]

So all valid starting point can be selected with random.randrange(len(f)-x+1) (where the +1 is because randrange works like range).

Store the random starting point into a variable start and slice your array with [start:start+x], or be creative and use another slice after the first:

result = f[random.randrange(len(f)-x+1):][:3]
Jongware
  • 22,200
  • 8
  • 54
  • 100
  • This works just perfect! Thank you for providing the details for the logic used in the code! It helps a lot! – msap Jul 06 '18 at 08:55
  • Isn't it more efficient (especially when taking for example the first 3 elements out of a million element array) to take out only the specific elements you need? – Robin De Schepper Sep 13 '20 at 17:37
  • @Robin: it probably would, but due to its syntax you then need ro store the random in a variable first. But yes, with million-plus lists you'd probably benefit from that tiny overhead. – Jongware Sep 14 '20 at 07:37
1

Building on usr2564301's answer you can take out only the elements you need in 1 go using a range so you avoid building a potentially very large intermediate array:

result = f[range(random.randrange(len(f)-x+1), x)]

A range also avoids that you build large index arrays when your length x becomes larger.

Robin De Schepper
  • 4,942
  • 4
  • 35
  • 56