-1

I am trying to use stratified cross validation after splitting the data into training and test as well as after standardizing the training set [enter image description here]

and after doing this it gives me an error

positional indexers are out-of-bounds

please help me how can i fix this?

(https://i.stack.imgur.com/R2bwR.png)

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • I changed these lines of code, but it still gives an error, I checked again if this data has the same size and the same indices, and everything converges, but the code still gives an error – Alexander Wilkiel Aug 28 '23 at 04:38
  • **for idx, (train_index, test_index) in enumerate(skf.split(X_train_std, y_train)): X_train, X_test = X_train_std.iloc[train_index], X_train_std.iloc[test_index] y_train, y_test = y_train.iloc[train_index], y_train.iloc[test_index]** – Alexander Wilkiel Aug 28 '23 at 04:41
  • Why are you trying to slice `X_test` with fold indices generated from `X_train`? – Ben Reiniger Aug 28 '23 at 11:40
  • 1
    Always provide the full error traceback, and [don't include code as images](https://meta.stackoverflow.com/q/285551/10495893). – Ben Reiniger Aug 28 '23 at 11:42
  • Please don't add comments to clarify. Code is unreadable in comments, and comments appear ordered by votes, not chronologically. [Edit] your question instead. See also [ask]. – Robert Aug 28 '23 at 22:52

1 Answers1

0

The correct method for performing cross-validation is, to scale the data while performing cross-validation when you split in folds, not before because when you split the scaled data into folds, you will be leaking the information in validation set while evaluating.

from sklearn.model_selection import StratifiedKFold
scv =  StratifiedKFold(n_splits=9,shuffle=True,random_state=7)

from sklearn.preprocessing import StandardScaler

def cross_validation_score(cv):
    scores_per_fold=[]
    for fold_no,(train,test) in enumerate(cv.split(X,Y)):
        print(f"Training for Fold {fold_no+1}")
        sc= StandardScaler()
        x_train , x_test = X[train] , X[test]
        x_train=sc.fit_transform(x_train)
        x_test=sc.transform(x_test)
        # build model is function where  neural network is defined
        nn=build_model()
        nn.fit(x_train,Y[train],batch_size=20,epochs=50,verbose=0)
        scores = nn.evaluate(x_test,Y[test])
        scores_per_fold.append(scores)
    return scores_per_fold

skcv_scores = cross_validation_score(scv)

You can also use sklearn.pipeline.Pipeline. In the above code if i want use pipeline then i have to use wrapper around keras, i didn't to make this simpler to understand.

Sauron
  • 551
  • 2
  • 11
  • check this out https://stackoverflow.com/questions/58939568/standardize-data-with-k-fold-cross-validation – Sauron Aug 28 '23 at 13:06