2

I tried different options but I always return to the .get_loc function. I got a big data frame and need to find the row index of a value nearest or backfill. The df looks like this:

     Date     Product     Price
 0   1/1      NEG         3
 1   1/1      NEG         3.3
 2   1/1      NEG         5.1
 3   1/1      POS         1.4
 4   1/1      POS         3.7
 5   1/1      POS         3.9
 6   1/1      POS         4.6
 7   1/2      NEG         1.2
 8   ...      ...         ...

df.columns.get_loc('Price') gives me 2 for the index of the column 'Price', but I need the index of a special row by section ('Date' and 'Product'), e.g:

df.loc[(df)['Date']=='1/1' & (df['Product']=='NEG')]

now, search Price == 3.4:

pd.Index(df.Price).get_loc(3.4, 'nearest')

This would give me index=1, but it does not work because data is to big, there are multiple '3.4'.

Is there any way to search for the nearest value with certain conditions, like above?

1 Answers1

0

welcome to Stackoverflow!

I'm not a fan of using .get_loc() so here's an alternative method to get what you want.

import pandas as pd

num = 3.4

# New dataframe fit_criteria for conditions (df['Date']=='1/1') & (df['Product']=='NEG')
fit_criteria = df.loc[(df['Date']=='1/1') & (df['Product']=='NEG')]

# Find absolute difference between values in price column and num. Find the index of
# the smallest difference using .idxmin()
nearest_to_num = (fit_criteria['Price']-num).abs().idxmin()

# Final result is the index of nearest number to num
nearest_to_num

If the comments aren't sufficient, here's a more verbose explanation of what's going on:

  1. First we define the number we want to find the nearest number to with
    num = 3.4
    
  2. Next we create a dataframe that fits the criteria of Date = 1/1 and Product = Neg by passing them as conditions in .loc[].

    fit_criteria = df.loc[(df['Date']=='1/1') & (df['Product']=='NEG')]
    
  3. We then generate a dataframe of the absolute difference between num and the values in the column price. Finally the .idxmin() method is used which returns the index of the first minimum value

    nearest_to_num = (fit_criteria['Price']-num).abs().idxmin()
    
  4. At the end of all this, nearest_to_num has a value of 1, corresponding to the index of the row you want.

Do note that this approach does not account for multiple values that are equally near to num. I hope this answers your question sufficiently but feel free to let me know if you require more details or clarification.


Reference used: How do I find the closest values in a Pandas series to an input number?

evantkchong
  • 2,251
  • 3
  • 14
  • 29
  • thanks for the answer! I just added `(df['Price'] > num)` to the fit_criteria df and called the needed index with `index = fit_criteria.index.values.astype(int)[0] `. In that way I also get `backfill` (first) value with these conditions. Or do you know how to do it in a more elegant way? Anyway, I will try it now, thanks! – Jannic Nagel Feb 22 '19 at 16:58
  • Unfortunately I'm not familiar with any shorter way to achieve the same result. Sorry! – evantkchong Feb 25 '19 at 06:04