IndexError for df_valid

Question

I am using python 3.6.8.

I was using the loop to convert the values in some columns as int:

for i in cols:
    df_valid[[i]] = df_valid[[i]].astype(int)

for which the given error was shown.

error: IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

As displayed by the full code below, I used the same thing with df_train. But, it didn't generate any error. I think it has to do something with

df_valid = imputer.transform(df_valid). But, I am not able to resolve it.

Can you please help and provide direction for solving this error.

My full code is as shown below:

import argparse
import os

import joblib
import pandas as pd
from sklearn.impute import KNNImputer
from sklearn import metrics

import config
import model_dispatcher


def run(fold, model):

 df = pd.read_csv(config.TRAINING_FILE)

 df["Gender"] = df["Gender"].map({"Male": 1, "Female": 0})
 df["Married"] = df["Married"].map({"No": 0, "Yes": 1})
 df["Self_Employed"] = df["Self_Employed"].map({"No": 0, "Yes": 1})
 df["Dependents"] = df["Dependents"].map({"0": 0, "1": 1, "2": 2, "3+": 3})
 df["Education"] = df["Education"].map({"Graduate": 1, "Not Graduate": 0})
 df["Loan_Status"] = df["Loan_Status"].map({"N": 0, "Y": 1})

 cols = ["Gender",
        "Married",
        "Dependents",
        "Education",
        "Self_Employed",
        "Credit_History",
        "Loan_Status"]

 dummy = pd.get_dummies(df["Property_Area"])
 df = pd.concat([df, dummy], axis=1)
 df = df.drop(["Loan_ID", "Property_Area"], axis=1)

 df_train = df[df.kfold != fold].reset_index(drop=True)

 df_valid = df[df.kfold == fold].reset_index(drop=True)

 imputer = KNNImputer(n_neighbors=18)
 df_train = pd.DataFrame(imputer.fit_transform(df_train),
                        columns=df_train.columns)
 for i in cols:
    df_train[[i]] = df_train[[i]].astype(int)

 df_valid = imputer.transform(df_valid)
 for i in cols:
    df_valid[[i]] = df_valid[[i]].astype(int)

 df_train['GxM'] = df_train.apply(lambda row:
                                 (row['Gender']*row['Married']),
                                 axis=1)
 df_train['Income_sum'] = (
                        df_train.apply(lambda row:
                                       (row['ApplicantIncome'] +
                                        row['CoapplicantIncome']),
                                       axis=1))
 df_train['DxE'] = df_train.apply(lambda row: (row['Education'] *
                                              row['Dependents']),
                                 axis=1)
 df_train['DxExG'] = (
                    df_train.apply(lambda row:
                                   (row['Education'] *
                                    row['Dependents'] *
                                    row['Gender']),
                                   axis=1))

 df_valid['GxM'] = df_valid.apply(lambda row:
                                 (row['Gender']*row['Married']),
                                 axis=1)
 df_valid['Income_sum'] = (
                        df_valid.apply(lambda row:
                                       (row['ApplicantIncome'] +
                                        row['CoapplicantIncome']),
                                       axis=1))
 df_valid['DxE'] = df_valid.apply(lambda row: (row['Education'] *
                                              row['Dependents']),
                                 axis=1)
 df_valid['DxExG'] = (
                    df_valid.apply(lambda row:
                                   (row['Education'] *
                                    row['Dependents'] *
                                    row['Gender']),
                                   axis=1))

 X_train = df_train.drop("Loan_Status", axis=1).values
 y_train = df_train.Loan_Status.values

 X_valid = df_valid.drop("Loan_Status", axis=1).values
 y_valid = df_valid.Loan_Status.values

 clf = model_dispatcher.models[model]

 clf.fit(X_train, y_train)

 preds = clf.predict(X_valid)

 rascore = metrics.roc_auc_score(y_valid, preds)
 print(f"Fold = {fold}, ROC-AUC = {rascore}")

 joblib.dump(
    clf,
    os.path.join(config.MODEL_OUTPUT, f"dt_{fold}.bin")
 )

if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    parser.add_argument("--fold", type=int)

    parser.add_argument("--model", type=str)

    args = parser.parse_args()

    run (fold=args.fold, model=args.model)

I just cleaned up a bit of your code. please see if its ok. ps: i didnt downvote. I think you have a lot of code here. You may want to just share the piece of code that's relevant so its clear what we need to look for. If you have too much noise (too much code), it distracts from the main problem. I have cleaned up the question. See if it looks OK. — Joe Ferndz, Mar 30 '21 at 04:15
Did you mean to write `df_valid[i]` instead of `df_valid[[i]]`. Also if you want to convert all the columns to integers, you dont have to loop them like this — Joe Ferndz, Mar 30 '21 at 04:20
You can give `df_valid.apply(pd.to_numeric).dtypes` to convert all of them to integer datatype — Joe Ferndz, Mar 30 '21 at 04:22

score 2 · Accepted Answer · answered Mar 30 '21 at 04:37

To convert all the columns to integer format, you can just give:

df_valid.apply(pd.to_numeric).dtypes

For more details on pd.to_numeric, see documentation

You may also want to read more about converting data to different datatypes in this Stack Overflow response

IndexError for df_valid

1 Answers1