X.shape #output is => (2555904, 1024, 2)
X[0] #Output is => array([[ 0.0420274 , 0.23476323], [-0.2728826 , 0.40513492], [-0.26707262, 0.22749889], ..., [-0.7055947 , -0.28693035], [-0.41157472, 0.66826206], [ 0.06487698, 0.6358149 ]], dtype=float32)
total = len(X)
n_train = int(0.8*total) #80% samples in the training dataset 20% in testing
n_test = int(0.2*total)
train_idx = np.random.choice(range(0, total), size=n_train, replace=False) # Randomly selecting 80% of data from total dataset
test_idx = list(set(range(0, total)) - set(train_idx))
train_idx.sort()
test_idx.sort()
X_train = X[train_idx]
X_test = X[test_idx]
I am stuck at the last two lines of this code i.e. X_train and X_test part. It is taking a lot of time to run that part of the code. Is there another way to do this? All I want is to segregate the X data into 80 20 ratios. Any suggestions are welcome.
The dataset that I am using is RadioML2018.01A.
The link for the same is: https://www.kaggle.com/pinxau1000/radioml2018-01a-get-started/data
The main problem I think is the size of the data, how to overcome it and segregate the data?