0

I have a dataset in .csv format. contains 2099846 rows and 38 columns I want to calculate the Euclidean distance of any pair of rows and set to another 2d array.

import pandas as pd
import numpy as np


data = pd.read_csv('fraudDataset.csv', encoding= 'unicode_escape')
row = len(data)

data = data.astype(int)

distanceMatrix = np.zeros((np.shape(data)))




for datai  in range(len(data)):
     for dataj in range( datai + 1,len(data)):
            distanceMatrix[datai,dataj] = np.linalg.norm(data[3] - data[4], ord=None, axis=None, keepdims=False)     
    

but it gives the error

   return self._engine.get_loc(casted_key)

  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 3

Could you please help me how to do this task?

martineau
  • 119,623
  • 25
  • 170
  • 301
Zahra
  • 317
  • 4
  • 16
  • Is it helpful? https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy – Mojtaba Valizadeh Oct 30 '21 at 08:44
  • Actually NO. I know how to calculate the Euclidean distance and I have tried to do it but it gave me the same error – Zahra Oct 30 '21 at 08:59

1 Answers1

0

I cannot replicate the problem, as there is inadequate information about the type of data, thus suggesting a fix to the error message. But from your problem description, I think cdist function from scipy.spatial* could solve your problem. As you have not provided an example data row, I created an integer matrix A.

from scipy.spatial.distance import cdist
A=np.random.randint(10, size=(10,10))

B=cdist(A, A, metric='euclidean')

B is a symmetric matrix, naturally.

* https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html