5

I have the following DataFrames:

example  = pd.DataFrame({"dirr":[1,0,-1,-1,1,-1,0], 
                         "value": [125,130,80,8,150,251,18], 
                         "result":[np.NaN for _ in range(7)]})

I would like to perform the following operation with cummin() and cummax() on it:

example["result"].apply(lambda x : x= example["value"].cummax() if example["dirr"]==1
                           else x= example["value"].cummin() if example["dirr"]==-1
                           else x= NaN if if example["dirr"]==0
                              )

this is returning : error: invalid syntax.

Could anyone help me straightening that one up?

That would be the intended output:

example  = pd.DataFrame({"dirr":[1,0,-1,-1,1,-1,0], 
                         "value": [125,130,80,8,150,251,18], 
                         "result":[125, NaN, 80, 8, 150, 8, NaN]})

EDIT:

So as per the answer of @su79eu7k the following function would do:

def calc(x):
    if x['dirr'] == 1:
        return np.diag(example["value"].cummax())
    elif x['dirr'] == -1:
        return np.diag(example["value"].cummin())
    else:
        return np.nan

I should be able to shove that into a lambda but still am blocked on the syntax error... which I still don't see?

example["result"]=example.apply(lambda x : np.diag(x["value"].cummax()) if x["dirr"]==1
                               else np.diag(x["value"].cummin()) if x["dirr"]==-1
                               else NaN if x["dirr"]==0
                              )

A final little nudge form you guys would be hugely appreciated.

Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
jim jarnac
  • 4,804
  • 11
  • 51
  • 88

3 Answers3

2

I think it makes the most sense to use separate lines instead of an apply. If you do use the apply function, you should create a separate function and pass it through rather than making a three-line lambda.

example.loc[example['dirr'] == 1, 'result'] = \
            example.loc[example['dirr'] == 1, 'value'].cummax()
example.loc[example['dirr'] == -1, 'result'] = \
            example.loc[example['dirr'] == -1, 'value'].cummin()

>>> example
   dirr  result  value
0     1   125.0    125
1     0     NaN    130
2    -1    80.0     80
3    -1     8.0      8
4     1   150.0    150
5    -1     8.0    251
6     0     NaN     18

Alternate apply approach below.

current_max = 0
current_min = 9999

def func(df):
    global current_max
    global current_min
    if df['dirr'] == 1:
        current_max = max(current_max, df['value'])
        return current_max
    elif df['dirr'] == -1:
        current_min = min(current_min, df['value'])
        return current_min
    else:
        return np.nan

example['result'] = example.apply(func, axis=1)
3novak
  • 2,506
  • 1
  • 17
  • 28
  • Thx for your reply, it does work. However I really don't understand the logic: you are passing a list of bools as a positional argument? How would you go at it if you were to make it a function and pass it through apply()? – jim jarnac Jan 02 '17 at 03:55
  • Correct, we index the portion of the dataframe for assignment, and then we subset the dataframe for the portion that influences the data returned. I've edited my post for an alternate apply function, but the global variables make it a little squirrelly. – 3novak Jan 02 '17 at 04:35
  • I can do the same with a Pandas.mask(). This is what i use to have. But i dont like the syntax and want to define the "result" value in 1 row. – jim jarnac Jan 02 '17 at 06:16
1

I think @3novak's solution is simple and fast. But if you really want to use apply function,

def calc(x):
    if x['dirr'] == 1:
        return example["value"].cummax()
    elif x['dirr'] == -1:
        return example["value"].cummin()
    else:
        return np.nan

example['result']  = np.diag(example.apply(calc, axis=1))

print example

   dirr  result  value
0     1   125.0    125
1     0     NaN    130
2    -1    80.0     80
3    -1     8.0      8
4     1   150.0    150
5    -1     8.0    251
6     0     NaN     18
su79eu7k
  • 7,031
  • 3
  • 34
  • 40
  • Thank you, this is very interesting: From what i see the function you created is essentially the same as the one i mention in my question, except for 2 things: 1) The syntax works! Why is it not working in my question? (i tried replacing the second `else` by `elif` but still had error). 2) The np.diag() function. What does it do exactly? Ideally i would like to keep my original lambda function. I think 3 lines for it is ok, and it makes the code clearer. – jim jarnac Jan 02 '17 at 05:00
  • https://google.github.io/styleguide/pyguide.html?showone=Lambda_Functions#Lambda_Functions and http://stackoverflow.com/questions/14029245/python-putting-an-if-elif-else-statement-on-one-line address the cons and impossibilities of the desired lambda. – 3novak Jan 02 '17 at 05:18
  • @su79eu7k : Ok i see, i cannot use a *statement* in an if/else in a lambda. Regarding my second question, what is there a `numpy.diag()` function? Why is it not just `example['result'] = example.apply(calc, axis=1)` – jim jarnac Jan 02 '17 at 05:42
  • Actually sticking closer to the expression used by @su79eu7k i rephrased my lambda to `example[result].apply(lambda x : example["value"].cummax() if x["dirr"]==1 ...`. There is no more statement in the lambda, but it is still returning `error:invalid syntax`. Why? PS: not interested in opinions why a 16 lines function is better... – jim jarnac Jan 02 '17 at 05:54
  • @3novak `example.apply(calc, axis=1))` generates calculation result dataframe and it consists of cummax or cummin columns based on dirr. Finally `np.diag` picks diagonal numbers of it. It might be helpful to try `print example.apply(calc, axis=1))` for understanding. – su79eu7k Jan 02 '17 at 06:16
  • @su79eu7k That's really great thx! Can you have a look at the edit i created in the question? Thank you – jim jarnac Jan 02 '17 at 06:47
0

All numpy

v = example.value.values
d = example.dirr.values
mx = np.maximum.accumulate(v)
mn = np.minimum.accumulate(v)
example['result'] = np.where(d == 1, mx, np.where(d == -1, mn, np.nan))
example

   dirr  result  value
0     1   125.0    125
1     0     NaN    130
2    -1    80.0     80
3    -1     8.0      8
4     1   150.0    150
5    -1     8.0    251
6     0     NaN     18

timing

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624