How to Split data into Testset and Trainset?

Question

I am trying to split the data into test set and train set but without using sklearn library. but sometime when i run the function this error occur but sometimes it doesnt show the error . Can anyone help me? following is the code

Does this answer your question? [Train test split without using scikit learn](https://stackoverflow.com/questions/47202182/train-test-split-without-using-scikit-learn) — rayryeng, Jan 23 '22 at 02:42

score 0 · Answer 1 · answered Jan 23 '22 at 10:24

You get that error because data_size is always the number of rows in your data frame and with this line:

indexes = randrange(data_size)
[...]
train_df = train_df.drop(train_df.index[[indexes]])

You are subsetting your original dataframe into train_df with every iteration but indexes can be longer than train_df . Not very sure why you want to do it this way, but if you just sample from train_df will make your code work:

import numpy as np
import pandas as pd
from random import randrange

df = pd.DataFrame(np.random.uniform(0,1,(77,7)))

def split(data , test_split_ratio):

    test_df = pd.DataFrame()
    test_size = test_split_ratio * data.shape[0]
    train_df = data.copy()

    while (len(test_df) < test_size):
        data_size = train_df.shape[0]
        indexes = randrange(data_size)
        test_df = test_df.append(data.iloc[indexes])
        train_df = train_df.drop(train_df.index[[indexes]])
    
    return train_df, test_df

Looks like this:

train,test = split(df,0.2)

print([train.shape,test.shape])
[(61, 7), (16, 7)]

How to Split data into Testset and Trainset?

1 Answers1