0

I am working with a custom dataset, and had to make the dataset myself. The following csv format is the result:

Wall#    Target    Feature1    Feature2    Feature3    Feature4    ...
1         Yes       <float>     <float>     <float>
2         No
3         Maybe

I have tried the following:

df = pd.read_csv("path_to_csv")
df['split'] = np.random.randn(df.shape[0], 1)
msk = np.random.rand(len(df)) <= 0.7
train.to_csv('train_coeffs.csv', index=False)
test.to_csv('test_coeffs.csv', index=False)

But it's giving me messed up results with changed data values. What is the most efficient way to randomly split the dataset into a 70-30 train-test set?

thegravity
  • 100
  • 2
  • 7
  • 3
    [sklearn.model_selection.train_test_split](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) - http://idownvotedbecau.se/noresearch/ – desertnaut Jun 26 '18 at 13:28
  • I hope dupe is correct, if not, let me know, I can reopen. – jezrael Jun 26 '18 at 13:30

0 Answers0