Average value of a metric from different pseudo-random runs of two numpy arrays different structures

Question

Given the following Python data structures: There are the same number of rows in df as there are values in labels.

df = 
[[0.003 0.9238 0.3882 0.3823]
[0.0383 0.2382 0.8328 0.3823]
...
[0.723 0.3282 0.1372 0.3723]]

labels = [0 1 0 0 ... 2]

I have a score function, which, given a df and its labels, calculates the value of a metric. The problem is that it is not scalable and I want to approximate its result by averaging from random draws of 100 individuals.

seed = 12345
N = 5
score_sum = 0
# Make Perform N times and average 
for i in range(0, N):
   # Suffle df and labels in the same way and select 100 points
   score_sum += score(subset_df, subset_labels)

score_sum = score_sum / N

Indicate that after shuffle, the same index need to be selected for df and labels.

Does this answer your question? [Better way to shuffle two numpy arrays in unison](https://stackoverflow.com/questions/4601373/better-way-to-shuffle-two-numpy-arrays-in-unison) — Kenny, Jun 08 '22 at 09:28
@Kenny I am not very clear from the post you sent me. In addition, I would like to be able to specify the seed. — Carola, Jun 08 '22 at 09:34

score 0 · Accepted Answer · answered Jun 08 '22 at 09:45

You could have:

seed = 12345
N = 5
score_sum = 0
# Make Perform N times and average 
for i in range(0, N):
   rnd_indices = np.random.choice(len(df), size=100, replace=False)
   subset_df, subset_labels = df[rnd_indices], labels[rnd_indices]
   score_sum += score(subset_df, subset_labels)

score_sum = score_sum / N

Average value of a metric from different pseudo-random runs of two numpy arrays different structures

1 Answers1