I have a data frame
in pyspark
like below.
df.show()
+---+----+
| id|name|
+---+----+
| 1| sam|
| 2| Tim|
| 3| Jim|
| 4| sam|
+---+----+
Now I added a new column to the df
like below
from pyspark.sql.functions import lit
from pyspark.sql.types import StringType
new_df = df.withColumn('new_column', lit(None).cast(StringType()))
Now when I query the new_df
new_df.show()
+---+----+----------+
| id|name|new_column|
+---+----+----------+
| 1| sam| null|
| 2| Tim| null|
| 3| Jim| null|
| 4| sam| null|
+---+----+----------+
Now I want to update the value in new_column
based on a condition.
I am trying to write the below condition but unable to do so.
if name
is sam
then new_column
should be tested
else not_tested
if name == sam:
then update new_column to tested
else:
new_column == not_tested
How can I achieve this in pyspark
.
Edit I am not looking for a if else statement but how to update the values of a record in
pyspark
column