I am given training data and their corresponding labels (integers 1,2,...,9) in two text files. Both text files are sequences of numbers.
The first 500 numbers in the training set correspond to the first data point, the second 500 numbers correspond to the second data point, etc.
I want to extract the subset of training points which have label 2 or label 3. My implementation of this is extremely slow:
import numpy as np
ytrain_old = np.genfromtxt('TrainLabels.txt')
Xtrain_old = np.genfromtxt('Train.txt')
Xtrain = []
ytrain = []
for i in range(10000):
if (ytrain_old[i]==2) or (ytrain_old[i]==3):
ytrain.append(ytrain_old[i])
Xtrain.append([Xtrain_old[i*500:(i+1)*500]])
What would be a better way to do this? I would prefer to have it as a pandas dataframe actually.