0

Need help in figuring out how to code this. I have 2 filters to be checked in a Dataframe and assign the values.

filters = "LIST_A in {0} AND LIST_B not in {1}".format(include_list,EXCLUDE_list) 

amount = "AND AMT_PD >= 10"

find_df = old_df.if(old_df.format(filters,amount)):
    old_df = old_df.withColumn("ID", F.lit('FOUND'))
            else:
                old_df = old_df.withColumn("ID", F.lit('NOT_FOUND'))
pault
  • 41,343
  • 15
  • 107
  • 149
Deepa
  • 13
  • 1
  • 3
  • Possible duplicate of [Spark Equivalent of IF Then ELSE](https://stackoverflow.com/questions/39048229/spark-equivalent-of-if-then-else) – Alper t. Turker May 10 '18 at 16:15

1 Answers1

0

Not sure if that what you really expect, as you provided example only, but not specified task description:

if old_df.filter('{0} {1}'.format(filters, amount)).count() > 0:
  old_df = old_df.withColumn("ID", F.lit('FOUND'))
else:
  old_df = old_df.withColumn("ID", F.lit('NOT_FOUND'))

Example above will add another column to data frame with all same values, if at least one record that satisfy conditions found.

Alternatively, if you want to perform check per record - query will looks like below:

old_df = old_df.withColumn("ID",
                           when(
                               (col('LIST_A').isin(include_list))
                                & ~(col('LIST_B').isin(EXCLUDE_list))
                                & (col('AMT_PD') >= 10),
                               'FOUND'
                           ).otherwise('NOT_FOUND')

ID column will be calculated for each record separately based on values in columns LIST_A, LIST_B and AMT_PD for that particular row.

vvg
  • 6,325
  • 19
  • 36