I have a numpy array "my_data". I am trying to split this dataset randomly. However, when I do this using the following code, I get a "train" array and a "test" array. Train array and test array have some rows in column.
training_idx = np.random.randint(my_data.shape[0], size=split_size)
test_idx = np.random.randint(my_data.shape[0], size=len(my_data)-split_size)
train, test = my_data[training_idx,:], my_data[test_idx,:]
My intention is to find train array first randomly and then whatever rows are left in my_data which are not in train array, to be a part of test array.
Is there a way in numpy to do so ? (I am refraining from using sklearn to split my data)
I referred to this post here to get here with my dataset. How to split/partition a dataset into training and test datasets for, e.g., cross validation?
If I code per this post’s logic I end up getting train and test data sets where train and test have some redundant rows in them. I intend on making train and test datasets where no rows are common.