A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.
Questions tagged [k-fold]
284 questions
27
votes
3 answers
StratifiedKFold vs KFold in scikit-learn
I use this code to test KFold and StratifiedKFold.
import numpy as np
from sklearn.model_selection import KFold,StratifiedKFold
X = np.array([
[1,2,3,4],
[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44],
…
user9270170
25
votes
3 answers
Separate pandas dataframe using sklearn's KFold
I had obtained the index of training set and testing set with code below.
df = pandas.read_pickle(filepath + filename)
kf = KFold(n_splits = n_splits, shuffle = shuffle, random_state =
randomState)
result = next(kf.split(df), None)
#train can be…

Mervyn Lee
- 1,957
- 4
- 28
- 54
9
votes
1 answer
MemoryError: Unable to allocate 30.4 GiB for an array with shape (725000, 277, 76) and data type float64
It gives that memory error but memory capacity is never reached. I have 60 GB of ram on the SSH and the full dataset process consumes 30
I am trying to train an autoendcoder with k-fold. Without k-fold the training works fine. The raw dataset…

iftekm
- 93
- 1
- 1
- 3
8
votes
2 answers
Forcing sklearn cross val score to use stratified k fold?
Based on Sklearn Docs:
Is it possible to force the use of StratifiedKFold?
How can I know which KFold has been used?

Marine Galantin
- 1,634
- 1
- 17
- 28
8
votes
4 answers
Cross validation for MNIST dataset with pytorch and sklearn
I am new to pytorch and are trying to implement a feed forward neural network to classify the mnist data set. I have some problems when trying to use cross-validation. My data has the following shapes:
x_train:
torch.Size([45000, 784]) and
y_train:…

Kimmen
- 183
- 1
- 1
- 8
8
votes
1 answer
How to do groupKfold validation and have balanced data?
I'm spliting some data in train and test set according to group values. How can I do this in order to have balanced data?
In order to solve a binary classification task I have 100 samples, each one with a unique ID a subject and a label(1 or 0).
In…

Albe
- 109
- 4
7
votes
1 answer
How to measure xgboost regressor accuracy using accuracy_score (or other suggested function)
I'm making a code to solve a simple problem of predict the probability of an item missing from an inventory.
I'm using the XGBoost prediction model to do this.
I have the data split in two .csv files, one with the Train Data and other with the Test…

Pedro Nader
- 75
- 1
- 1
- 5
6
votes
1 answer
Getting TypeError: Singleton array array(None, dtype=object) cannot be considered a valid collection
I am using different cross validation method. I first use k fold method on my code and it was perfectly well but when I use repeatedstratifiedkfold method it gives me this error
TypeError: Singleton array array(None, dtype=object) cannot be…

Rao Kiran
- 61
- 1
- 4
5
votes
1 answer
Should I put shuffle=True or False in sklearn KFold cross validation?
I'm studying some cross_validation scores on my dataset using cross_val_score and KFold
In particular my code looks like this:
cross_val_score(estimator=model, X=X, y=y, scoring='r2', cv=KFold(shuffle=True))
My question is if it's a common…

James Arten
- 523
- 5
- 16
5
votes
0 answers
StandardScaler to whole training dataset or to individual folds for Cross Validation
I'm currently using cross_val_score and KFold to assess the impact of using StandardScaler at different points within data pre-processing, specifically whether scaling the entire training dataset prior to performing cross validation introduces data…

AlexTerry
- 61
- 3
4
votes
1 answer
Huggingface Trainer(): K-Fold Cross Validation
I am following this tutorial from TowardsDataScience for text classification using Huggingface Trainer.
To get a more robust model I want to do a K-Fold Cross Validation, but I am not sure how to do this with Huggingface Trainer.
Is there a built-in…

Maxl Gemeinderat
- 197
- 3
- 14
4
votes
1 answer
How to implement K-Fold Cross validation using Image data generator and using Flow from dataframe (using CSV file)
Please show or explain a dummy example code snippet demonstrating K-Fold Cross Validation with Flow_from_Dataframe, Training_Generator, and Valid_Generator objects for Keras.
This is the current code I have (no k-fold only simple fitting…

Bhuvan S
- 213
- 1
- 4
- 10
4
votes
1 answer
Does GridSearchCV return the best_estimator_ after fitting?
Let's say we tune an SVM with GridSearch like this:
algorithm = SVM()
parameters = {'kernel': ['rbf', 'sigmoid'], 'C': [0.1, 1, 10]}
grid= GridSearchCV(algorithm, parameters)
grid.fit(X, y)
You then wish to use the best fit parameters/estimator in…

Bram Vanroy
- 27,032
- 24
- 137
- 239
3
votes
2 answers
Application and Deployment of K-Fold Cross-Validation
K-Fold Cross Validation is a technique applied for splitting up the data into K number of Folds for testing and training. The goal is to estimate the generalizability of a machine learning model. The model is trained K times, once on each train fold…

notMyName
- 690
- 2
- 6
- 17
3
votes
2 answers
difference between cross_val_score and KFold
I am learning Machine learning and I am having this doubt. Can anyone tell me what is the difference between:-
from sklearn.model_selection import cross_val_score
and
from sklearn.model_selection import KFold
I think both are used for k fold cross…

Tob60
- 41
- 1
- 4