0
X.shape  #output is => (2555904, 1024, 2)
X[0] #Output is => array([[ 0.0420274 , 0.23476323], [-0.2728826 , 0.40513492], [-0.26707262, 0.22749889], ..., [-0.7055947 , -0.28693035], [-0.41157472, 0.66826206], [ 0.06487698, 0.6358149 ]], dtype=float32)
total = len(X)
n_train = int(0.8*total) #80% samples in the training dataset 20% in testing
n_test = int(0.2*total)
train_idx = np.random.choice(range(0, total), size=n_train, replace=False) # Randomly selecting 80% of data from total dataset
test_idx = list(set(range(0, total)) - set(train_idx))
train_idx.sort()
test_idx.sort()

X_train = X[train_idx]
X_test = X[test_idx]

I am stuck at the last two lines of this code i.e. X_train and X_test part. It is taking a lot of time to run that part of the code. Is there another way to do this? All I want is to segregate the X data into 80 20 ratios. Any suggestions are welcome.

The dataset that I am using is RadioML2018.01A.

The link for the same is: https://www.kaggle.com/pinxau1000/radioml2018-01a-get-started/data

The main problem I think is the size of the data, how to overcome it and segregate the data?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • There's no "instant way" to manipulate 20 GB of memory like that. Your better choice is to load only the parts necessary (aka batch of data) – Alexey S. Larionov Feb 15 '22 at 13:21
  • Does this answer your question? [How to split/partition a dataset into training and test datasets for, e.g., cross validation?](https://stackoverflow.com/questions/3674409/how-to-split-partition-a-dataset-into-training-and-test-datasets-for-e-g-cros) – jjramsey Feb 15 '22 at 13:23

1 Answers1

0

You can use sklearn.model_selection.train_test_split. Check out the official documentation with an example. You have to split the data in target variable and explanatory variables at first.