Below is the input dataframe.
+-----------+---------+------------------+----------------------+-----------+
| DATE | ID |sal | vat | flag |
+-----------+---------+------------------+----------------------+------------
|10-may-2022| 1 | 1000.0| 12.0 1 |
|12-may-2022| 2 | 50.0| 6.0| 1 |
+-----------+---------+------------------+----------------------+------------
I want to perfrom the below based on the flag column
If the flag column is 1, I will do the below.
df = srcdf.withColumn("sum",col("sal")*2)
display(df)
If the flag column is 2, I will do the below.
df = srcdf.withColumn("sum",col("sal")*4)
display(df)
Below is the code Im using.
flag = srcdf.select(col("flag"))
if flag == 1 :
df = srcdf.withColumn("sum",col("sal")*2)
display(df)
else:
df = srcdf.withColumn("sum",col("sal")*4)
display(df)
When I use the above, I am getting syntax error. Is there any other way I can achieve this using the pyspark conditional statements.
Thank you.