My dataframe looks similar to this example below (just with way more entries). I want to obtain the nearest upper and lower number for a given value, for each group.
a b
600 10
600 12
600 15
600 17
700 8
700 11
700 19
For example for a value of 13. I would like to obtain a new dataframe similar to:
a b
600 12
600 15
700 11
700 19
I already tried the solution from Ivo Merchiers in How do I find the closest values in a Pandas series to an input number? using groupby and apply to run it for the different groups.
def find_neighbours(value):
exactmatch=df[df.num==value]
if !exactmatch.empty:
return exactmatch.index
else:
lowerneighbour_ind = df[df.num<value].num.idxmax()
upperneighbour_ind = df[df.num>value].num.idxmin()
return [lowerneighbour_ind, upperneighbour_ind]
df=df.groupby('a').apply(find_neighbours, 13)
But since my dataset has around 16 million lines this procedure takes extremely long. Is there possibly a faster way to obtain a solution?
Edit Thanks for your answers. I forgot to add some info. If a close number appears multiple times I would like to have all lines transfered to the new dataframe. And when there is only one upper (lower) and no lower (upper) neighbour, this lines should be ignored.
a b
600 10
600 12
600 15
600 17
700 8
700 11
700 19
800 14
800 15
900 12
900 14
900 14
Leads for 13 to this:
a b
600 12
600 15
700 11
700 19
900 12
900 14
900 14
Thanks for your help!