pandas find closest value from other column to entire column

Question

I have a pandas DataFrame with two columns of numbers:

index	X	Y
0	0.1	0.55
1	0.2	0.2
2	0.4	0.1
3	0.8	0.35
4	1	0.9

I want to find tuples of the indexes of the closest values.
For example for this dataframe I would want the result

(0,2), (1,1), (2,3), (3,4), (4,4)

I saw this link explaining how to do this for a given input, but I am wondering if there is a cleaner way to do this for my task. Thanks in Advance!

It_is_Chris · Accepted Answer · 2022-08-04T19:34:46.767

1

You can try list comprehension if you are not working with a huge dataset. Subtract (.sub()) each x value from each y value, get the absolute value (.abs()) and then get the int position of the smallest value in the Series (.argmin()).

l = [(idx, df['Y'].sub(x).abs().argmin()) for idx, x in enumerate(df['X'])]
# [(0, 2), (1, 1), (2, 3), (3, 4), (4, 4)]

Update - i.e., avoid the loop

l2 = list(zip(df.index, df['X'].sub([df['Y'].values]*len(df)).abs().agg(np.argmin)))
# [(0, 2), (1, 1), (2, 3), (3, 4), (4, 4)]

edited Aug 04 '22 at 19:34

answered Aug 04 '22 at 18:41

It_is_Chris

13,504
2
23
41

First of all thanks, this does work, do you think this is the best solution in terms of efficiency? this takes approximately 3 seconds on my dataset and this needs to be done thousands of times – Ofek Glick Aug 04 '22 at 19:02
1

@OfekGlick see update to avoid the loop . . . should be quicker – It_is_Chris Aug 04 '22 at 19:17
Thanks for your help! I managed to actually do this faster by converting the DF to a numpy array and used reshaping and broadcasting to calculated distances faster. When time allows I'll post another answer – Ofek Glick Aug 04 '22 at 21:27

pandas find closest value from other column to entire column

1 Answers1

Update - i.e., avoid the loop