0

I have a network with two input branches to a neural network. I want to use sklearn's train_test_split function to split my dataset into train, test and validation set. I know if I have one input array then I can do the split as follows:

from sklearn.model_selection import train_test_split

X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X, Y, test_size=0.2)
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)

But if I have two inputs X1 and X2 how can I split the data then provided that data is split in unison. Insights will be appreciated.

John
  • 815
  • 11
  • 31

1 Answers1

0

The first thing I can think of, is zipping both inputs, use train_test_split, and then separate those:

X = np.array(list(zip(X1, X2)))
X_train, X_test, y_train, y_test = train_test_split(X, y)
X1_train, X2_train = X_train[:, 0], X_train[:, 1]

However this can consume a lot of memory due the amount of data you have. Another approach in case you are using tensorflow, is to implement train_test_split using tf.data.Dataset, check this question

jcaliz
  • 3,891
  • 2
  • 9
  • 13