I have a dataset X
such that X.shape
yields (10000, 9)
. I want to choose a subset of X
with the following code:
X = np.asarray(np.random.normal(size = (10000,9)))
train_fraction = 0.7 # fraction of X that will be marked as train data
train_size = int(X.shape[0]*train_fraction) # fraction converted to number
test_size = X.shape[0] - train_size # remaining rows will be marked as test data
train_ind = np.asarray([False]*X.shape[0])
train_ind[np.random.randint(low = X.shape[0], size = (train_size,))] = True # mark True at 70% of the places
The problem is that np.sum(train_ind)
is not the expected value of 7000. Instead it gives random values like 5033, etc.
I initially thought that np.random.randint(low = X.shape[0], size = (train_size,))
might be the culprit. But when I do np.random.randint(low = X.shape[0], size = (train_size,)).shape
I get (7000,)
.
Where am I going wrong?