1

I wanted to implement KNN in python. Till now I have loaded my data into Pandas DataFrame.

import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
train_df = pd.read_csv("creditlimit_train.csv") # train dataset
train_df.head()

The output of head is

SNo      Salary      LoanAmt   Level
101      100000      10000     Low Level
102      108500      11176     Low Level
103      125500      13303     Low Level
104      134000      14606     Low Level
105      142500      15960     Low Level


test_df = pd.read_csv("creditlimit_test.csv")
test_df.head()

The output of head is

SNo      Salary      LoanAmt   Level
101      100000      10000     Low Level
102      108500      11176     Low Level
103      125500      13303     Low Level
104      134000      14606     Low Level
105      142500      15960     Low Level

neigh = KNeighborsClassifier(n_neighbors=5,algorithm='auto')
predictor_features = ['Salary','LoanAmt']
dependent_features = ['Level']
neigh.fit(train_df[predictor_features],train_df[dependent_features])

How do I use the fit function to use salary,loanAmt as predictor to predict the levels for my test_df?

Update 1: The levels are 3 : Low, Medium and High

Ash Upadhyay
  • 1,796
  • 2
  • 15
  • 20
  • 1
    You should probably try looking at the sklearn API. http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier – ec2604 Jun 21 '18 at 06:34

1 Answers1

1

You can convert your DataFrame to a numpy array and pass as input

# convert class labels in numerical data, assuming you have two classes
df['Level'].replace(['Low Level'],0)
df['Level'].replace(['High Level'],1)

# extra data and class labels
data = df[['Salary','LoanAmt']]
target = df['Level']

# convert df to numpy arrays
data = data.values
target =  target.values

# you would ideally want to do a test train split.
#Train the model on training data and test on the test data for accuracy

#pass in fit function
neigh = KNeighborsClassifier(n_neighbors=5,algorithm='auto')
neigh.fit(data,target) ## how to passs the parameters here?

Some useful links:

Convert pandas dataframe to numpy array, preserving index

Replacing few values in a pandas dataframe column with another value

Selecting columns in a pandas dataframe

Yuvraj Jaiswal
  • 1,605
  • 13
  • 20
  • Thank you for your answer. I think that is going to help me for sure :) Yuvraj bhai, I am looking for something to compare my data of train with the test. Data is almost same with some entries purposefully changed to be tested. How can I compare tow different datasets? – Ash Upadhyay Jun 21 '18 at 06:42
  • 1
    What data do you want to compare? If you want to check how well your model has predicted, please see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html – Yuvraj Jaiswal Jun 21 '18 at 07:03
  • 1
    Glad to be of help – Yuvraj Jaiswal Jun 21 '18 at 07:26