0

I am trying to split the data into test set and train set but without using sklearn library. but sometime when i run the function this error occur but sometimes it doesnt show the error . Can anyone help me? following is the code

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
The Yk
  • 1
  • 1
    Does this answer your question? [Train test split without using scikit learn](https://stackoverflow.com/questions/47202182/train-test-split-without-using-scikit-learn) – rayryeng Jan 23 '22 at 02:42

1 Answers1

0

You get that error because data_size is always the number of rows in your data frame and with this line:

indexes = randrange(data_size)
[...]
train_df = train_df.drop(train_df.index[[indexes]])

You are subsetting your original dataframe into train_df with every iteration but indexes can be longer than train_df . Not very sure why you want to do it this way, but if you just sample from train_df will make your code work:

import numpy as np
import pandas as pd
from random import randrange

df = pd.DataFrame(np.random.uniform(0,1,(77,7)))

def split(data , test_split_ratio):

    test_df = pd.DataFrame()
    test_size = test_split_ratio * data.shape[0]
    train_df = data.copy()

    while (len(test_df) < test_size):
        data_size = train_df.shape[0]
        indexes = randrange(data_size)
        test_df = test_df.append(data.iloc[indexes])
        train_df = train_df.drop(train_df.index[[indexes]])
    
    return train_df, test_df

Looks like this:

train,test = split(df,0.2)

print([train.shape,test.shape])
[(61, 7), (16, 7)]
StupidWolf
  • 45,075
  • 17
  • 40
  • 72