1

I have a pandas DataFrame with two columns of numbers:

index X Y
0 0.1 0.55
1 0.2 0.2
2 0.4 0.1
3 0.8 0.35
4 1 0.9

I want to find tuples of the indexes of the closest values.
For example for this dataframe I would want the result

(0,2), (1,1), (2,3), (3,4), (4,4)

I saw this link explaining how to do this for a given input, but I am wondering if there is a cleaner way to do this for my task. Thanks in Advance!

Ofek Glick
  • 999
  • 2
  • 9
  • 20

1 Answers1

1

You can try list comprehension if you are not working with a huge dataset. Subtract (.sub()) each x value from each y value, get the absolute value (.abs()) and then get the int position of the smallest value in the Series (.argmin()).

l = [(idx, df['Y'].sub(x).abs().argmin()) for idx, x in enumerate(df['X'])]
# [(0, 2), (1, 1), (2, 3), (3, 4), (4, 4)]

Update - i.e., avoid the loop

l2 = list(zip(df.index, df['X'].sub([df['Y'].values]*len(df)).abs().agg(np.argmin)))
# [(0, 2), (1, 1), (2, 3), (3, 4), (4, 4)]
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
  • First of all thanks, this does work, do you think this is the best solution in terms of efficiency? this takes approximately 3 seconds on my dataset and this needs to be done thousands of times – Ofek Glick Aug 04 '22 at 19:02
  • 1
    @OfekGlick see update to avoid the loop . . . should be quicker – It_is_Chris Aug 04 '22 at 19:17
  • Thanks for your help! I managed to actually do this faster by converting the DF to a numpy array and used reshaping and broadcasting to calculated distances faster. When time allows I'll post another answer – Ofek Glick Aug 04 '22 at 21:27