0

I have an array data set of (10, 170) in shape.

Name: id_matrix

array([[   1,  171,  341, ..., 1191, 1361, 1531],
       [   2,  172,  342, ..., 1192, 1362, 1532],
       [   3,  173,  343, ..., 1193, 1363, 1533],
       ...,
       [ 168,  338,  508, ..., 1358, 1528, 1698],
       [ 169,  339,  509, ..., 1359, 1529, 1699],
       [ 170,  340,  510, ..., 1360, 1530, 1700]])

I would like to loop through each of the 170 columns, which also contains 170 numbers each, and randomly select five numbers. I will then print them to the screen as a group, in this outline, I will be ably to format accordingly when the code is working correctly.

Group 1: [ 92  73 139  54 147]
Group 2: [182 333 219 292 214]

I also need to set a np.random.seed(489) to preserve replication and repeatability. I tried to capture these values and getting stuck.

col=0
data=[row[col] for row in id_matrix]
print(data)

or this version:

import pandas as pd
df[df.columns.to_series().sample(5)]

None of these approaches seems to look like what I want... I ran Google searches, but do not seem to find any leads as to how to generate the loop that I need to create the random set from these columns.

Please advise...

ekhumoro
  • 115,249
  • 20
  • 229
  • 336
Johnny
  • 819
  • 1
  • 10
  • 24
  • Maybe https://stackoverflow.com/q/232237/5987 can provide some guidance? – Mark Ransom Jan 17 '20 at 18:38
  • I would use `np.random.randint` to generate 5 random index values, then use those to index into your array. To clarify one point, though: You have 10 rows, 170 columns, and the value (for example) at `[9,9]` will itself be a sublist of 170 numbers? – G. Anderson Jan 17 '20 at 19:01

1 Answers1

0

My solution from your comments:

for i in range(10):
  random.seed(436)
  sampling = random.sample(set(id_matrix[:,i]), k=5)
  print("Group {}: {}".format(i+1, sampling))
Johnny
  • 819
  • 1
  • 10
  • 24