I'll need to create an if multiple else in a pyspark dataframe.
I have two columns to be logically tested.
Logic is below:
- If Column A OR Column B contains "something", then write "X"
- Else If (Numeric Value in a string of Column A + Numeric Value in a string of Column B) > 100 , then write "X"
- Else If (Numeric Value in a string of Column A + Numeric Value in a string of Column B) > 50 , then write "Y"
- Else If (Numeric Value in a string of Column A + Numeric Value in a string of Column B) > 0 , then write "Z"
- Else, then write "T"
to a new column "RESULT"
I thought the quickest search method is when, otherwise, otherwise, otherwise, otherwise and failed in the query below.
I'd be appreciated if you suggest a quicker method for that.
Note: when(clause).when(clause).when(clause).when(clause).otherwise(clause) searches the whole table again and again. I want to proceed with unmatched data only.
df = df.withColumn('RESULT', F.when(\
F.when((F.col("A").like("%something%") | F.col("B").like("%something%")), "X").otherwise(\
F.when((((F.regexp_extract(F.col("A"), ".(\d+).", 1)) + F.regexp_extract(F.col("B"), ".(\d+).", 1)) > 100), "X").otherwise(\
F.when((((F.regexp_extract(F.col("A"), ".(\d+).", 1)) + F.regexp_extract(F.col("B"), ".(\d+).", 1)) > 29), "Y").otherwise(\
F.when((((F.regexp_extract(F.col("A"), ".(\d+).", 1)) + F.regexp_extract(F.col("B"), ".(\d+).", 1)) > 0), "Z").otherwise(\
"T"))))))