I would like to create a train_test_split function that splits a matrix (vector of vectors) of data into two other matrices, similar to what sklearn's function does. This is my attempt in doing so:
#include <iostream>
#include <cstdlib>
#include <fstream>
#include <time.h>
#include <vector>
#include <string>
using namespace std;
vector<vector<float>> train_test_split(vector<vector<float>> df, float train_size = 0.8){
vector<vector<float>> train;
vector<vector<float>> test;
srand(time(NULL));
for(int i = 0; i < df.size(); i++){
int x = rand() % 10 + 1;
if(x <= train_size * 10){
train.push_back(df[i]);
}
else{
test.push_back(df[i]);
}
}
return train, test;
}
int main(){
vector<vector<float>> train;
vector<vector<float>> test;
vector<vector<float>> df = {{1,2,3,4},
{5,6,7,8},
{9,10,11,12}};
train, test = train_test_split(df);
cout << "training size: " << train.size() << ", test size: " << test.size() << endl;
return 0;
}
This approach sends data only in the test
matrix. After some research, I have discovered that C++ cannot output two values in the same function. I am very new in C++, and I am wondering what would be the best way to approach this. Any help will be appreciated.