1

I have two dataframes (attached image). For each of the given row in Table-1 -

Part1 - I need to find the row in Table-2 which gives the minimum Euclidian distance. Output-1 is the expected answer.

Part2 - I need to find the row in Table-2 which gives the minimum Euclidian distance. Output-2 is the expected answer. Here the only difference is that a row from Table-2 cannot be selected two times.

I tried this code to get the distance but not sure on how to add other fields -

import numpy as np
from scipy.spatial import distance

s1 = np.array([(2,2), (3,0), (4,1)])
s2 = np.array([(1,3), (2,2),(3,0),(0,1)])
print(distance.cdist(s1,s2).min(axis=1))

Two dataframes and the expected output:

screenshots

martineau
  • 119,623
  • 25
  • 170
  • 301
RPyML
  • 13
  • 4
  • For the second case, you may want to use the hungarian algorithm. First compute all pairwise distances, then find the optimal bipartite matching – fr_andres Jul 25 '21 at 01:26

1 Answers1

0

The code now gives the desired output, and there's a commented out print statement for extra output.

It's also flexible to different list lengths.

Credit also to: How can the Euclidean distance be calculated with NumPy?

Hope it helps:

from numpy import linalg as LA

list1 = [(2,2), (3,0), (4,1)]
list2 = [(1,3), (2,2),(3,0),(0,1)]

names = range(0, len(list1) + len(list2))
names = [chr(ord('`') + number + 1) for number in names]

i = -1
j = len(list1) #Start Table2 names
for tup1 in list1:
    collector = {} #Let's collect values for each minimum check
    j = len(list1)
    i += 1
    name1 = names[i]
    for tup2 in list2:
        name2 = names[j]
        a = numpy.array(tup1)
        b = numpy.array(tup2)
#        print ("{} | {} -->".format(name1, name2), tup1, tup2, "   ", numpy.around(LA.norm(a - b), 2))
        j += 1
        collector["{} | {}".format(name1, name2)] = numpy.around(LA.norm(a - b), 2)
        if j == len(names):
            min_key = min(collector, key=collector.get)
            print (min_key, "-->" , collector[min_key])

Output:

a | e --> 0.0
b | f --> 0.0
c | f --> 1.41
thenarfer
  • 405
  • 2
  • 14
  • Could you explain a bit more what minimum values you are looking for. These are all combinations, but I didn't quite understand where to go from here. – thenarfer Jul 25 '21 at 01:21
  • 1
    Thanks! I am looking for the minimum Euclidian values for each of the list1 element with any of the list 2 elements. Based on the above output answers would be for the pairs a | e , b | f , c | f – RPyML Jul 25 '21 at 02:28
  • Is there a way to make this more efficient ? I have around 1 million rows in each of the dfs – RPyML Jul 27 '21 at 16:47