Questions tagged [k-fold]

A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.

284 questions
27
votes
3 answers

StratifiedKFold vs KFold in scikit-learn

I use this code to test KFold and StratifiedKFold. import numpy as np from sklearn.model_selection import KFold,StratifiedKFold X = np.array([ [1,2,3,4], [11,12,13,14], [21,22,23,24], [31,32,33,34], [41,42,43,44], …
user9270170
25
votes
3 answers

Separate pandas dataframe using sklearn's KFold

I had obtained the index of training set and testing set with code below. df = pandas.read_pickle(filepath + filename) kf = KFold(n_splits = n_splits, shuffle = shuffle, random_state = randomState) result = next(kf.split(df), None) #train can be…
Mervyn Lee
  • 1,957
  • 4
  • 28
  • 54
9
votes
1 answer

MemoryError: Unable to allocate 30.4 GiB for an array with shape (725000, 277, 76) and data type float64

It gives that memory error but memory capacity is never reached. I have 60 GB of ram on the SSH and the full dataset process consumes 30 I am trying to train an autoendcoder with k-fold. Without k-fold the training works fine. The raw dataset…
iftekm
  • 93
  • 1
  • 1
  • 3
8
votes
2 answers

Forcing sklearn cross val score to use stratified k fold?

Based on Sklearn Docs: Is it possible to force the use of StratifiedKFold? How can I know which KFold has been used?
Marine Galantin
  • 1,634
  • 1
  • 17
  • 28
8
votes
4 answers

Cross validation for MNIST dataset with pytorch and sklearn

I am new to pytorch and are trying to implement a feed forward neural network to classify the mnist data set. I have some problems when trying to use cross-validation. My data has the following shapes: x_train: torch.Size([45000, 784]) and y_train:…
Kimmen
  • 183
  • 1
  • 1
  • 8
8
votes
1 answer

How to do groupKfold validation and have balanced data?

I'm spliting some data in train and test set according to group values. How can I do this in order to have balanced data? In order to solve a binary classification task I have 100 samples, each one with a unique ID a subject and a label(1 or 0). In…
Albe
  • 109
  • 4
7
votes
1 answer

How to measure xgboost regressor accuracy using accuracy_score (or other suggested function)

I'm making a code to solve a simple problem of predict the probability of an item missing from an inventory. I'm using the XGBoost prediction model to do this. I have the data split in two .csv files, one with the Train Data and other with the Test…
Pedro Nader
  • 75
  • 1
  • 1
  • 5
6
votes
1 answer

Getting TypeError: Singleton array array(None, dtype=object) cannot be considered a valid collection

I am using different cross validation method. I first use k fold method on my code and it was perfectly well but when I use repeatedstratifiedkfold method it gives me this error TypeError: Singleton array array(None, dtype=object) cannot be…
Rao Kiran
  • 61
  • 1
  • 4
5
votes
1 answer

Should I put shuffle=True or False in sklearn KFold cross validation?

I'm studying some cross_validation scores on my dataset using cross_val_score and KFold In particular my code looks like this: cross_val_score(estimator=model, X=X, y=y, scoring='r2', cv=KFold(shuffle=True)) My question is if it's a common…
James Arten
  • 523
  • 5
  • 16
5
votes
0 answers

StandardScaler to whole training dataset or to individual folds for Cross Validation

I'm currently using cross_val_score and KFold to assess the impact of using StandardScaler at different points within data pre-processing, specifically whether scaling the entire training dataset prior to performing cross validation introduces data…
AlexTerry
  • 61
  • 3
4
votes
1 answer

Huggingface Trainer(): K-Fold Cross Validation

I am following this tutorial from TowardsDataScience for text classification using Huggingface Trainer. To get a more robust model I want to do a K-Fold Cross Validation, but I am not sure how to do this with Huggingface Trainer. Is there a built-in…
4
votes
1 answer

How to implement K-Fold Cross validation using Image data generator and using Flow from dataframe (using CSV file)

Please show or explain a dummy example code snippet demonstrating K-Fold Cross Validation with Flow_from_Dataframe, Training_Generator, and Valid_Generator objects for Keras. This is the current code I have (no k-fold only simple fitting…
Bhuvan S
  • 213
  • 1
  • 4
  • 10
4
votes
1 answer

Does GridSearchCV return the best_estimator_ after fitting?

Let's say we tune an SVM with GridSearch like this: algorithm = SVM() parameters = {'kernel': ['rbf', 'sigmoid'], 'C': [0.1, 1, 10]} grid= GridSearchCV(algorithm, parameters) grid.fit(X, y) You then wish to use the best fit parameters/estimator in…
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
3
votes
2 answers

Application and Deployment of K-Fold Cross-Validation

K-Fold Cross Validation is a technique applied for splitting up the data into K number of Folds for testing and training. The goal is to estimate the generalizability of a machine learning model. The model is trained K times, once on each train fold…
notMyName
  • 690
  • 2
  • 6
  • 17
3
votes
2 answers

difference between cross_val_score and KFold

I am learning Machine learning and I am having this doubt. Can anyone tell me what is the difference between:- from sklearn.model_selection import cross_val_score and from sklearn.model_selection import KFold I think both are used for k fold cross…
1
2 3
18 19