0

I have a numpy array and want to extract some random rows of it. This solution does it but my case is a little bit detailed. I want to find some random rows based on the last column of my array. This column has some unique values and I want to extract randoms in each unique group. This is my array:

import numpy as np
my_array=np.array([['1.66', '1.67', 'group_B'],\
                   ['1.', '5.65', 'group_C'],\
                   ['9.06', '10.2', 'group_B'],\
                   ['2.', '0.2', 'group_B'],\
                   ['11.', '2.1', 'group_C']])

For example, I have two groups in last column of my data. From each group, I want to extract randomly n number of rows (for example 1 row per group). I do appreciate any help to do what I want in Python.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Ali_d
  • 1,089
  • 9
  • 24

1 Answers1

1

This is one way you could do it:

Code

import numpy as np
import random

max_items = 2
my_array=np.array([['1.66', '1.67', 'group_B'],
                   ['1.', '5.65', 'group_C'],
                   ['9.06', '10.2', 'group_B'],
                   ['2.', '0.2', 'group_B'],
                   ['2.', '0.2', 'group_B'],
                   ['2.', '0.2', 'group_C'],
                   ['11.', '2.1', 'group_C']])

classes = np.unique(my_array[:,-1])
new_array = []
for cls in classes:
    for i in range(max_items):
        new_array.append(random.choice(my_array[my_array[:,-1]==cls]))

print(*new_array, sep='\n')

Output

['2.' '0.2' 'group_B']
['2.' '0.2' 'group_B']
['2.' '0.2' 'group_C']
['1.' '5.65' 'group_C']
Abhi_J
  • 2,061
  • 1
  • 4
  • 16
  • Dear @Abhi_J, thanks for your fantastic solution. I have a question: What is `max_items`? is it the number of groups that I have? r is it the number random rows taht I get? – Ali_d Jun 02 '21 at 09:23
  • 1
    @Ali_d, It is `n` in your question *From each group, I want to extract randomly n number of rows (for example 1 row per group)* – Abhi_J Jun 02 '21 at 09:26