I have a dataset in .csv format. contains 2099846 rows and 38 columns I want to calculate the Euclidean distance of any pair of rows and set to another 2d array.
import pandas as pd
import numpy as np
data = pd.read_csv('fraudDataset.csv', encoding= 'unicode_escape')
row = len(data)
data = data.astype(int)
distanceMatrix = np.zeros((np.shape(data)))
for datai in range(len(data)):
for dataj in range( datai + 1,len(data)):
distanceMatrix[datai,dataj] = np.linalg.norm(data[3] - data[4], ord=None, axis=None, keepdims=False)
but it gives the error
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 3
Could you please help me how to do this task?