0

I am attempting to populate a column titled 'label' which is the result of conditional statements within a lambda function which involves two columns of the data frame. I would like to create numerical labels based off of specific conditions found within the 'WY' and 'WY Week' columns. For example the label is 1 if WY is less than 2010 and 2 if WY is greater than 2010 and 3 if the WY value is greater than 2010 for WY Week values between 26 and 40.

I dont have an issue with one conditional for one column as seen below:

GC['label'] = GC['WY'].apply(lambda x: 1 if x >= 1985 else 0)

But I throw a code when I attempt to write a conditional statement involving two columns and multiple conditions:

CJ['label'] = CJ[['WY','WY Week']].apply(lambda x,y: 1 if x < 2010 else (2 if x >= 2010 and (y >= 26 and y <= 40)) else )

The error is a syntax error:

File "<ipython-input-21-6b6fa416588d>", line 7
CJ['label'] = CJ[['WY','WY Week'].apply(lambda x,y: 1 if x < 2010 else (2 if x >= 2010) and (y >= 26 and y <= 40) else )
                                                                                      ^
SyntaxError: invalid syntax

I feel like i'm pretty close but would like some assistance as it is 1 of several conditional statements that I need to write like this.

rweber
  • 132
  • 7
  • You have nothing after the last `else`. What should the value be in that case? – Barmar Oct 20 '21 at 22:50
  • `CJ[['WY','WY Week']` You're missing a closing `]`. `CJ[['WY','WY Week']]`. Ternary operator without an else is not permitted in python `(2 if x >= 2010)` Can you outline what you are trying to accomplish? Because it is not immediately clear from the provided code. – Henry Ecker Oct 20 '21 at 22:51
  • Also, the last `else` needs to be inside the `(2 if x >= 2010 ...)` parentheses. – Barmar Oct 20 '21 at 22:51
  • 2
    I suggest you don't do this with `lambda`, you're just making it confusing by trying to put everything into one line. Define a named function and use `if` statements. – Barmar Oct 20 '21 at 22:52
  • 1
    @HenryEcker I fixed the bracket. Big oversight on my part. – rweber Oct 20 '21 at 22:52

2 Answers2

1

Define a named function instead of trying to cram everything into a complex lambda.

There's no need to test x >= 2010 in the else; if it gets to the else, that must be true.

def labelval(x, y):
    if x < 2010:
        return 1
    elif 26 <= y <= 40:
        return 2
    else:
        return 3

CJ['label'] = CJ[['WY','WY Week']].apply(labelval)
Barmar
  • 741,623
  • 53
  • 500
  • 612
0
# hopefully a readable function that makes label conditions clear
def classify(wy, wy_week):
    if wy < 2020:
        return 1
    elif 26 <= wy_week <= 40
        return 2
    else:
        return 3 # I guess?

# fast, vectorized calculation using two columns
GB['label'] = list(map(classify,GC['WY'],GC['WY Week']))

One of my favorite best stack overflow answers ever: Performance of Pandas apply vs np.vectorize to create new column from existing columns

Mike Holcomb
  • 403
  • 3
  • 9
  • I didn't see @Barmar's when I submitted mine! His is good too! – Mike Holcomb Oct 20 '21 at 23:08
  • That was a great link. Relatively new to python and programming as a whole so trying out new things as a trial by fire even though they may not be the most efficient. The link was helpful and I think I will combine both @Barmar solution as well as yours to complete what I am trying to do. – rweber Oct 21 '21 at 03:47