How to find the relative similarity/difference between items using KNN & Scikit-Learn?

Question

I am trying to find the relative similarity, and/or difference, between items, such as basketball players. I am using a KNN classifier for this task. For instance, based on data, I want to see how similar Lebron James is to, let's say Carmelo Anthony, and I want to see how similar Lebron James is to Ray Allen, and I want to see how similar Carmelo Anthony is to Ray Allen. I want to compare each person to each other person.

I am running the code below.

import numpy as np  
import matplotlib.pyplot as plt  
import pandas as pd  

with open('C:\\path_here\\nba.csv', 'r') as csvfile:
    dataset = pd.read_csv(csvfile)
print(dataset.columns.values)

# convert to dataframe
dataset=pd.DataFrame(dataset)
dataset.dtypes

# fill NAs with zeros
dataset = dataset.fillna(0)

dataset.isnull().sum()
dataset.isnull().sum().sum()


dataset.head()  


X = dataset.iloc[:,4:27] 
y = dataset.iloc[:,28] 


from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)  


from sklearn.preprocessing import StandardScaler  
scaler = StandardScaler()  
scaler.fit(X_train)

X_train = scaler.transform(X_train)  
X_test = scaler.transform(X_test) 


from sklearn.neighbors import KNeighborsClassifier  
classifier = KNeighborsClassifier(n_neighbors=5)  
classifier.fit(X_train, y_train) 

y_pred = classifier.predict(X_test)  


from sklearn.metrics import classification_report, confusion_matrix  
print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred))

The data comes from here:

https://www.dropbox.com/s/b3nv38jjo5dxcl6/nba_2013.csv?dl=0

Basically, the code runs fine, but the output looks weird, or I forgot to include something. Anyway, I’m trying to get results something like this:

LebronJames vs. CarmeloAnthony: .95
CarmeloAnthony vs. RayAllen: .92
RayAllen vs. LebronJames: .91

https://pythonhosted.org/scikit-fuzzy/auto_examples/plot_cmeans.html

Fuzzy Group By, Grouping Similar Words

Why are you using a classifier for this? You can directly build a pairwise similarity/distance matrix, based on any distance metric you want. Refer this if you wish to build a distance matrix: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html — panktijk, Jun 03 '19 at 23:22

How to find the relative similarity/difference between items using KNN & Scikit-Learn?

0 Answers0