I am trying to use if condition inside a python function and then use it to make some calcs with dataframe values.
#init data
+---+----+----+------+
| id|team|game|result|
+---+----+----+------+
| 1| A|Home| |
| 2| A|Away| |
| 3| B|Home| |
| 4| B|Away| |
| 5| C|Home| |
| 6| C|Away| |
| 7| D|Home| |
| 8| D|Away| |
+---+----+----+------+
### I wanna replace the value result and I tried use a function
def replace_result(team_name,game_kind,result):
if col('team') == team_name and col('game') == game_kind:
return result
else:
return col('result')
df = df.withColumn('result',replace_result('A','Away','0-1')
but gave me the error
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
My question is
Is it possible to use if conditions using Pyspark dataframe columns?
Thanks