1

I want to search a dataframe to find a value correspond to index row and column value. However, I am struggling because the column header values are the upper bound of a range where the previous column header value is the lower bound ( but the uppper bound of its other neightbor column value).

I cannot find a way to match my input with the column corresponding to the lower bound of the range.

with an example it is very easy to see:

data_frame_test = pd.DataFrame({'location' : [1, 2, 'S'],
                                '200' : [342, 690, 103],
                                '1000' : [322, 120, 193],
                                '2000' : [249, 990, 403]})

data_frame_test = data_frame_test.set_index('location')

and what I want to do is this

location = 1
weight = 500

output = data_frame_test.iloc[1][??] #must be equal to 342

see, the column where weight must look into is 200, because it is in the range between ]200;1000[. I don't know what else to try to translate that into python code. Any help would be greatly appreciated.

Murcielago
  • 905
  • 1
  • 8
  • 30

3 Answers3

1

First convert columns to integers by rename if necessary:

data_frame_test = data_frame_test.set_index('location').rename(columns=int)
print (data_frame_test)
          200   1000  2000
location                  
1          342   322   249
2          690   120   990
S          103   193   403

weight = 500
location = 1

And then match values by positions with DataFrame.loc with last position of True value by compare by less values like weight:

#https://stackoverflow.com/a/8768734
b = (data_frame_test.columns[::-1] < weight)
pos = len(b) - np.argmax(b) - 1
print (pos)
0

output = data_frame_test.loc[location, data_frame_test.columns[pos]]
print (output)
342

Or you can use DataFrame.iloc with position by Index.get_loc:

output = data_frame_test.iloc[data_frame_test.index.get_loc(location), pos]
print (output)
342
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

I can think of cansting the columns to int, and indexing the dataframe using Series.searchsorted, and boolean indexing on the index:

location = 1
weight = 500 

data_frame_test.iloc[data_frame_test.index==location, 
                     data_frame_test.columns.astype('int').searchsorted(weight)-1]

location
1    342
Name: 200, dtype: int64
yatu
  • 86,083
  • 12
  • 84
  • 139
1

You can make a custom function which will iterate over columns to check correct column and then return cell of that location:

def get_val(location, weight, df):
    col = df.columns[0]
    for c in df.columns:
        if weight >= int(c):
            col = c
        else:
            break
    return df.loc[location, col]
get_val(1, 500, data_frame_test)
Mohit Sharma
  • 590
  • 3
  • 10