1

I have this data:

data

It is much larger than this but I am just testing it for the first 5 entries. I am trying to get the Euclidean distance for the latitude and longitude. My method works when I simply use the latitude and longitude as vectors but when I created a function to do it, for some reason I get totally different results.

For example when I do this:

D = np.zeros((len(x), len(y)))
for i in range(len(x)):
    for j in range(len(y)):
        D[i][j] = np.sqrt((x[i] - x[j])**2 + (y[i] - y[j])**2)

print(D)

I get the correct result:

[[0.00000000e+00 1.88271381e+02 1.87587947e+02 6.99323921e+01
  1.87539502e+02]
 [1.88271381e+02 0.00000000e+00 7.75171148e-01 1.66386511e+02
  8.46161225e-01]
 [1.87587947e+02 7.75171148e-01 0.00000000e+00 1.65616303e+02
  7.61935378e-02]
 [6.99323921e+01 1.66386511e+02 1.65616303e+02 0.00000000e+00
  1.65549538e+02]
 [1.87539502e+02 8.46161225e-01 7.61935378e-02 1.65549538e+02
  0.00000000e+00]]

But then why I use this function:

def eucDist(matrix):
    dist_mat = np.zeros((len(matrix), len(matrix)))
    for i in range(len(matrix)):
        for j in range(len(matrix)):
            dist_mat[i][j] = np.sqrt((A[i][0] - A[j][0])**2 + (A[i][1] - A[j][1])**2)
            return dist_mat

I get this incorrect result:

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]

I am not sure what is wrong with the function I have written. Here is my entire code:

# Import necessary modules
import pandas as pd
import numpy as np
import datetime


# Create necessary functions
def eucDist(matrix):
    dist_mat = np.zeros((len(matrix), len(matrix)))
    for i in range(len(matrix)):
        for j in range(len(matrix)):
            dist_mat[i][j] = np.sqrt((A[i][0] - A[j][0])**2 + (A[i][1] - A[j][1])**2)
            return dist_mat


# Import data into a dataframe
df = pd.read_csv("data2.csv")
df['year'] = pd.DatetimeIndex(df['close_date']).year
df['month'] = pd.DatetimeIndex(df['close_date']).month
df['day'] = pd.DatetimeIndex(df['close_date']).day
df['hour'] = pd.DatetimeIndex(df['close_date']).hour
df['minute'] = pd.DatetimeIndex(df['close_date']).minute
print(df.head())

# Create necessary variables
x = df.as_matrix(columns=df.columns[:1])                # latitude
y = df.as_matrix(columns=df.columns[1:2])               # longitude
year = df.as_matrix(columns=df.columns[4:5])
month = df.as_matrix(columns=df.columns[5:6])
day = df.as_matrix(columns=df.columns[6:7])
hour = df.as_matrix(columns=df.columns[7:8])
min = df.as_matrix(columns=df.columns[8:9])
p = df.as_matrix(columns=df.columns[3:4])                # close_price

A = np.c_[x, y, year, month, day, hour, min, p]          # created a matrix of all the attributes needed



Dist = eucDist(A)
print(Dist)


print('-'*50)

# Get distance of latitude and longitude
D = np.zeros((len(x), len(y)))
for i in range(len(x)):
    for j in range(len(y)):
        D[i][j] = np.sqrt((x[i] - x[j])**2 + (y[i] - y[j])**2)

print(D)

I need to get this function to work so if anyone can show me what I need to do to get this to work I would really appreciate it.

  • You're returning too soon. The `return` statement in `eucDist` is executed in the first iteration of the inner loop, not after the outer loop completes. Dedent it. – chepner Mar 30 '18 at 22:19
  • I see so where do I return it? –  Mar 30 '18 at 22:21
  • @chepner I got it now, sorry I am a neewbbbb –  Mar 30 '18 at 22:23

1 Answers1

-1

You need to put the return outside the fors, otherwise you are only computing the dist_mat[0][0] element:

def eucDist(matrix):
dist_mat = np.zeros((len(matrix), len(matrix)))
for i in range(len(matrix)):
    for j in range(len(matrix)):
        dist_mat[i][j] = np.sqrt((A[i][0] - A[j][0])**2 + (A[i][1] - A[j][1])**2)
return dist_mat

Anyway this question is already answered here: How can the euclidean distance be calculated with numpy?

Erik Garcia
  • 88
  • 1
  • 6