I am trying to split the data into test set and train set but without using sklearn library. but sometime when i run the function this error occur but sometimes it doesnt show the error . Can anyone help me? following is the code
Asked
Active
Viewed 97 times
0
-
1Does this answer your question? [Train test split without using scikit learn](https://stackoverflow.com/questions/47202182/train-test-split-without-using-scikit-learn) – rayryeng Jan 23 '22 at 02:42
1 Answers
0
You get that error because data_size
is always the number of rows in your data frame and with this line:
indexes = randrange(data_size)
[...]
train_df = train_df.drop(train_df.index[[indexes]])
You are subsetting your original dataframe into train_df
with every iteration but indexes
can be longer than train_df
. Not very sure why you want to do it this way, but if you just sample from train_df
will make your code work:
import numpy as np
import pandas as pd
from random import randrange
df = pd.DataFrame(np.random.uniform(0,1,(77,7)))
def split(data , test_split_ratio):
test_df = pd.DataFrame()
test_size = test_split_ratio * data.shape[0]
train_df = data.copy()
while (len(test_df) < test_size):
data_size = train_df.shape[0]
indexes = randrange(data_size)
test_df = test_df.append(data.iloc[indexes])
train_df = train_df.drop(train_df.index[[indexes]])
return train_df, test_df
Looks like this:
train,test = split(df,0.2)
print([train.shape,test.shape])
[(61, 7), (16, 7)]

StupidWolf
- 45,075
- 17
- 40
- 72