There are two arrays, an array of images and an array of the corresponding labels. (e.g pictures of figures and it's values)
The occurrences in the labels are unevenly distributed.
What I want is to cut both arrays in such a way, that the labels are evenly distributed. E.g. every label occurs 2 times.
To test I've just created two 1D arrays and it was working:
labels = np.array([1, 2, 3, 3, 1, 2, 1, 3, 1, 3, 1,])
images = np.array(['A','B','C','C','A','B','A','C','A','C','A',])
x, y = zip(*sorted(zip(images, labels)))
label = list(set(y))
new_images = []
new_labels = []
amount = 2
for i in label:
start = y.index(i)
stop = start + amount
new_images = np.append(new_images, x[start: stop])
new_labels = np.append(new_labels, y[start: stop])
What I get/want is this:
new_labels: [ 1. 1. 2. 2. 3. 3.]
new_images: ['A' 'A' 'B' 'B' 'C' 'C']
(It is not necessary, that the arrays are sorted)
But when I tried it with the right data (images.shape = (35000, 32, 32, 3), labels.shape = (35000)) I've got an error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This does not help me a lot: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think that my solution is quite dirty anyhow. Is there a way to do it right?
Thank you very much in advance!