0

I have been wondering how to vectorize the function below for my dataframe. And can anything be vectorized?

Here is my dataframe:

                    date    AGE
   0    28/04/2017 13:08    25
   1    28/04/2017 08:58    87
   2    03/05/2017 07:59    23
   3    03/05/2017 08:05    45
   4    04/05/2017 08:05    26
   5    05/05/2017 08:05    10
   6    06/05/2017 08:05    56
   7    07/05/2017 08:05    39

Here is the function I want to use for vectorisation:

def decision(value):
    if  value>40:
       return 1
    return 0

I do not want to use np.where, or any lambda expression.

Camue
  • 469
  • 7
  • 17
  • 2
    Why would you ask for a vectorized approach and then say that you don't want to use `np.where()`? That doesn't make sense – roganjosh Dec 15 '19 at 11:30
  • you dont want to use `np.where()` directly on the df column?, or you dont want to use `np.where()` even inside the function eg: `def decision(value): return np.where(value>40,1,0)`? – anky Dec 15 '19 at 11:34
  • @roganjosh just because I heard its more efficient and I never really vectorized anything. so this is my beginning. – Camue Dec 15 '19 at 12:17
  • That comment totally baffles me. You want to vectorize something and you're rejecting a vectorized approach. – roganjosh Dec 15 '19 at 12:20
  • I want to use **np.vectorize** @roganjosh – Camue Dec 15 '19 at 12:23
  • @Camue perhaps read how [vectorization](https://stackoverflow.com/questions/47755442/what-is-vectorization/47755634) works , try and operate directly on columns rather than rows id that is possible. With this usecase it is possible to operate on columns – anky Dec 15 '19 at 12:27

2 Answers2

2

use Series.gt + Series.astype.

This is much faster and more efficient than apply method. Query: when should I use apply

df['Age'].gt(40).astype(int)


#def decision(age):
#    if age>40:
#        return 1
#    return 0
#    
#df['AGE'].apply(decision)
ansev
  • 30,322
  • 5
  • 17
  • 31
  • 1
    For correctness I deleted my answer, however I think you should expand explaining the solution and not just type "use: This...", which would not give much of an idea to the OP – FBruzzesi Dec 15 '19 at 11:53
  • This code is self-explanatory, it might be interesting to know when to use apply( check edit) I consider it a mistake to use the method apply here. On the other hand I don't think there are reasons to downvote – ansev Dec 15 '19 at 11:57
  • I actually was the first to upvote your answer sir. I am just saying that from the question I deduce that who posted it barely knows any pandas functionality or how vectorization works in in pandas. – FBruzzesi Dec 15 '19 at 12:01
  • @ansev. Thanks but I have used apply before and I am trying to get away from **applying**. is there a way to use np.vectorize create a new column. – Camue Dec 15 '19 at 12:13
  • Is the function you plan to apply the one you have shown or is it more complex? I do not understand why not use the solution I proposed. Could you explain your problem more explicitly? – ansev Dec 15 '19 at 12:15
  • @ansev thanks for your prompt response. You got the problem perfectly. It's just that I want to use **np.vectorize** for fun as I never used and apparently its good. – Camue Dec 15 '19 at 12:21
0

you may want to use numba for a similar performance.

@numba.jit(nopython=True, nogil=True)
def decision(arr_in: np.array, decision_num: int) -> np.array:
    n = arr_in.shape[0]
    decision_arr = np.empty(n, dtype=numba.int64)
    for i in range(n):
        if arr_in[i] > decision_num:
           decision_arr[i] = 1
        else:
            decision_arr[i] = 0
    return decision_arr

and then create the column:

df['decision'] = decision(df['AGE'].to_numpy(), 40) 

use .values if pandas is older then 0.24

moshevi
  • 4,999
  • 5
  • 33
  • 50