Add column to pyspark dataframe based on a condition

Question

My data.csv file has three columns like given below. I have converted this file to python spark dataframe.

  A   B    C
| 1 | -3 | 4 |
| 2 | 0  | 5 |
| 6 | 6  | 6 |

I want to add another column D in spark dataframe with values as Yes or No based on the condition that if corresponding value in B column is greater than 0 then yes otherwise No.

  A   B    C   D
| 1 | -3 | 4 | No  |
| 2 | 0  | 5 | No  |
| 6 | 6  | 6 | Yes |

I am not able to implement this through PySpark dataframe operations.

score 21 · Accepted Answer · answered Feb 23 '19 at 07:04

21

Try something like this:

from pyspark.sql import functions as f
df.withColumn('D', f.when(f.col('B') > 0, "Yes").otherwise("No")).show()

answered Feb 23 '19 at 07:04

1pluszara

1,518
3
14
26

1

you can also import col, when directly to avoid f.when. I find it less verbose in some cases – Jorge Tovar Jul 27 '22 at 21:00

Add column to pyspark dataframe based on a condition

1 Answers1

Linked