Sklearn's train_test_split split with two inputs and one output

Question

I have a network with two input branches to a neural network. I want to use sklearn's train_test_split function to split my dataset into train, test and validation set. I know if I have one input array then I can do the split as follows:

from sklearn.model_selection import train_test_split

X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X, Y, test_size=0.2)
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)

But if I have two inputs X1 and X2 how can I split the data then provided that data is split in unison. Insights will be appreciated.

Yeah they are both numpy arrays. One is of shape (40011,38) and the other is of shape (40011,301,4). — John, Mar 24 '21 at 19:38

score 0 · Answer 1 · answered Mar 24 '21 at 19:58

The first thing I can think of, is zipping both inputs, use train_test_split, and then separate those:

X = np.array(list(zip(X1, X2)))
X_train, X_test, y_train, y_test = train_test_split(X, y)
X1_train, X2_train = X_train[:, 0], X_train[:, 1]

However this can consume a lot of memory due the amount of data you have. Another approach in case you are using tensorflow, is to implement train_test_split using tf.data.Dataset, check this question

Sklearn's train_test_split split with two inputs and one output

1 Answers1