Is it possible to split dataframe into training and testing sets by specifying the actual size i want instead of using ratio? I see most examples use randomSplit..
463715 samples for training
51630 samples for testing
In scikit-learn i was able to do this, for example:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 10000, random_state = 42)