I am working with a custom dataset, and had to make the dataset myself. The following csv format is the result:
Wall# Target Feature1 Feature2 Feature3 Feature4 ...
1 Yes <float> <float> <float>
2 No
3 Maybe
I have tried the following:
df = pd.read_csv("path_to_csv")
df['split'] = np.random.randn(df.shape[0], 1)
msk = np.random.rand(len(df)) <= 0.7
train.to_csv('train_coeffs.csv', index=False)
test.to_csv('test_coeffs.csv', index=False)
But it's giving me messed up results with changed data values. What is the most efficient way to randomly split the dataset into a 70-30 train-test set?