Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions

votes

3 answers

difference between StratifiedKFold and StratifiedShuffleSplit in sklearn

As from the title I am wondering what is the difference between StratifiedKFold with the parameter shuffle=True StratifiedKFold(n_splits=10, shuffle=True, random_state=0) and StratifiedShuffleSplit StratifiedShuffleSplit(n_splits=10,…

asked Aug 30 '17 at 20:43

gabboshow

5,359
12
48
98

votes

3 answers

scikit-learn cross validation, negative values with mean squared error

When I use the following code with Data matrix X of size (952,144) and output vector y of size (952), mean_squared_error metric returns negative values, which is unexpected. Do you have any idea? from sklearn.svm import SVR from sklearn import…

python regression scikit-learn cross-validation

asked Jan 29 '14 at 22:18

ahmethungari

2,089
4
19
21

votes

2 answers

How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)

I'm running GridSearch CV to optimize the parameters of a classifier in scikit. Once I'm done, I'd like to know which parameters were chosen as the best. Whenever I do so I get a AttributeError: 'RandomForestClassifier' object has no attribute…

python scikit-learn random-forest cross-validation

asked May 07 '15 at 13:45

sapo_cosmico

6,274
12
45
58

votes

6 answers

Using explicit (predefined) validation set for grid search with sklearn

I have a dataset, which has previously been split into 3 sets: train, validation and test. These sets have to be used as given in order to compare the performance across different algorithms. I would now like to optimize the parameters of my SVM…

python validation scikit-learn cross-validation

asked Aug 11 '15 at 18:03

pir

5,513
12
63
101

votes

6 answers

What is the difference between cross-validation and grid search?

In simple words, what is the difference between cross-validation and grid search? How does grid search work? Should I do first a cross-validation and then a grid search?

machine-learning cross-validation difference definition grid-search

asked Oct 12 '13 at 14:17

Linda

2,375
4
30
33

votes

6 answers

module 'sklearn' has no attribute 'cross_validation'

I am trying to split my dataset into training and testing dataset, but I am getting this error: X_train,X_test,Y_train,Y_test = sklearn.cross_validation.train_test_split(X,df1['ENTRIESn_hourly']) AttributeError Traceback…

python scikit-learn cross-validation

asked Oct 04 '17 at 19:11

Naren

votes

6 answers

How to split data on balanced training set and test set on sklearn

I am using sklearn for multi-classification task. I need to split alldata into train_set and test_set. I want to take randomly the same sample number from each class. Actually, I amusing this function X_train, X_test, y_train, y_test =…

machine-learning scikit-learn svm cross-validation

asked Feb 18 '16 at 04:13

Jeanne

1,241
3
19
28

votes

6 answers

Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead

Working with Sklearn stratified kfold split, and when I attempt to split using multi-class, I received on error (see below). When I tried and split using binary, it works no problem. num_classes = len(np.unique(y_train)) y_train_categorical =…

python machine-learning keras scikit-learn cross-validation

asked Jan 29 '18 at 18:48

jKraut

2,325
6
35
48

votes

4 answers

return coefficients from Pipeline object in sklearn

I've fit a Pipeline object with RandomizedSearchCV pipe_sgd = Pipeline([('scl', StandardScaler()), ('clf', SGDClassifier(n_jobs=-1))]) param_dist_sgd = {'clf__loss': ['log'], 'clf__penalty': [None, 'l1', 'l2',…

python machine-learning scikit-learn cross-validation scikit-learn-pipeline

asked May 08 '17 at 19:56

spies006

2,867
2
19
28

votes

8 answers

How to extract model hyper-parameters from spark.ml in PySpark?

I'm tinkering with some cross-validation code from the PySpark documentation, and trying to get PySpark to tell me what model was selected: from pyspark.ml.classification import LogisticRegression from pyspark.ml.evaluation import…

pyspark modeling cross-validation apache-spark-mllib apache-spark-ml

asked Apr 18 '16 at 14:46

Paul

3,321
1
33
42

votes

4 answers

Using statsmodel estimations with scikit-learn cross validation, is it possible?

I posted this question to Cross Validated forum and later realized may be this would find appropriate audience in stackoverlfow instead. I am looking for a way I can use the fit object (result) ontained from python statsmodel to feed into…

python scikit-learn cross-validation statsmodels

asked Dec 08 '16 at 17:51

CARTman

votes

2 answers

What is OOF approach in machine learning?

I have seen in many kaggle notebooks people talk about oof approach when they do machine learning with K-Fold validation. What is oof and is it related to k-fold validation ? Also can you suggest some useful resources for it to get the concept in…

machine-learning cross-validation kaggle

asked Sep 19 '18 at 00:12

Nikhil Mishra

1,182
2
18
34

votes

3 answers

Difference between cross_val_score and cross_val_predict

I want to evaluate a regression model build with scikitlearn using cross-validation and getting confused, which of the two functions cross_val_score and cross_val_predict I should use. One option would be : cvs = DecisionTreeRegressor(max_depth =…

python machine-learning scikit-learn regression cross-validation

asked Apr 25 '17 at 14:25

Bobipuegi

votes

4 answers

How is scikit-learn cross_val_predict accuracy score calculated?

Does the cross_val_predict (see doc, v0.18) with k-fold method as shown in the code below calculate accuracy for each fold and average them finally or not? cv = KFold(len(labels), n_folds=20) clf = SVC() ypred = cross_val_predict(clf, td, labels,…

python scikit-learn cross-validation

asked Jan 04 '17 at 07:57

Roman

3,007
8
26
54

votes

3 answers

Early stopping with Keras and sklearn GridSearchCV cross-validation

I wish to implement early stopping with Keras and sklean's GridSearchCV. The working code example below is modified from How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras. The data set may be downloaded from here. The…

machine-learning scikit-learn keras cross-validation grid-search

asked Jan 06 '18 at 12:54

Justin Solms

2 3

…

99 100 Next