2

I know to cast pandas DataFrame column to a list (with .tolist() or list()) and then do what you want, will make it way much slower, so I don't want to use these methods.

I want to find the index of the first element of a pandas DataFrame column which is equal or greater than a value x, in other words >=x. And if there is nothing then return None.

For example, if the column is this and our function is called first_greater():

    0
0   1
1  -5
2   6
3   4
4  -7
5  12
6  -2
7   0
8  -3

Then we have:

first_greater(-5) = 0
first_greater(7) = 5
first_greater(4) = 2
first_greater(6) = 2
first_greater(22) = None

I'm new to pandas and I don't know how to do this. Any help would be appreciated.

Peyman
  • 3,097
  • 5
  • 33
  • 56

3 Answers3

4

You want to check both if any value in the dataframe is greater than the given value, and also return the first value that satsfies the condition. You have idxmax for that:

def first_greater(df, n, col):
    m = df.col.ge(n)
    return m.any() and m.idxmax() 

Note that in the return statement, the right part of the and is only evaluated if the first condition m.any() is satisfied, otherwise False is returned.


Let's check with the proposed examples:

first_greater(df, 5, 'col1')
# 0

first_greater(df, 7, 'col1')
# 5

first_greater(df, 4, 'col1')
# 2

first_greater(df, 6, 'col1')
# 2

first_greater(df, 22, 'col1')
# False

Input data -

    col1
0     1
1    -5
2     6
3     4
4    -7
5    12
6    -2
7     0
8    -3
yatu
  • 86,083
  • 12
  • 84
  • 139
  • we want to return None not False if nothing is greater – Dan Sep 06 '19 at 14:01
  • The problem with that is that nothing is returned from the function, so I think a boolean value or a string of some kind is more useful as an indicator than `None`@dan – yatu Sep 06 '19 at 14:02
  • Your welcome @Peymanmohsenikiasari note that the `and` can be changed into an `if else` statement, which would make it easier if you want to return `None` or a specific string. Otherwise if `False` works this is clean and simple – yatu Sep 06 '19 at 14:07
  • How is `first_greater(df, 6, 'col1')` 2? it should be 5 right? – moys Sep 06 '19 at 14:11
  • I was following OPs logic, that's why I'm using `ge`("greater or equal") @SH-SF – yatu Sep 06 '19 at 14:12
1
s = pd.Series([1, -5, 6, 4, -7, 12, -2, 0, -3])

def first_greater(n):
    condition = (s >= n)
    if condition.any():
        return condition.idxmax()
    else:
        return None
Dan
  • 1,575
  • 1
  • 11
  • 17
1

I know you have the answer already. But just another approach to show the possibilities

def fg(n):
try:
    a = df.loc[df.col1.ge(n)].index[0]
    return a
except:
    print('None')
moys
  • 7,747
  • 2
  • 11
  • 42
  • Thank you. but I thing invoking an exception can make your process much slower. – Peyman Sep 06 '19 at 14:26
  • agreed. I was just trying something & found this approach. Since i am learning pandas myself, thought of sharing. That's all. – moys Sep 06 '19 at 14:29