Compare two CSVs and find closest centroid match

Question

I have two CSV files, which contain an index, X, Y, and value. It looks something like this.

X and Y coordinates are not an exact match, but only a close match. The index is assigned randomly. The total no. of rows can be more or less. Now, I want to match centroids and subtract values at the corresponding centroid to get an idea of the difference.

My main hindrance is to match centroids. Once centroids are matched between CSV1 and CSV2, I might be able to map indices and get values.

Expected output: I am mainly concerned about this value_diff.

I am using pandas to read CSV as:

file1 = pd.read_csv('CSV1')
file2 = pd.read_csv('CSV2')

I figured to map centroid I can use a Euclidean distance match. And thus created a function for calculating it:

def dist(x0, x1, y0, y1):
    return ((x1 - x0)**2 + (y1 - y0)**2)**(1/2)

After this, I am not sure how to proceed. I got a function to find the nearest value in the array using this answer and now I am trying to expand it to more than one row:

def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return idx,array[idx]

Edit: Created dataframe to reproduce input as asked in the comments:

data1 = {'index': [1, 2, 3, 4, 5],
        'X': [40, 50, 60, 70, 80],
        'Y': [10, 30, 40, 50, 60],
        'Value': [100, 200, 300, 400, 500]}
file1 = pd.DataFrame(data1)

data2 = {'index': [4, 6, 3, 10, 5, 9],
        'X': [71, 41, 81, 52, 62, 46],
        'Y': [52, 11, 62, 31, 44, 96],
        'Value': [404, 102, 511, 202, 305, 250]}
file2 = pd.DataFrame(data2)

Try and provide your sample inputs and outputs as text so that it is easy to copy-paste and reproduce the problem — Mortz, May 31 '22 at 09:36

Compare two CSVs and find closest centroid match

0 Answers0