5

My data.csv file has three columns like given below. I have converted this file to python spark dataframe.

  A   B    C
| 1 | -3 | 4 |
| 2 | 0  | 5 |
| 6 | 6  | 6 |

I want to add another column D in spark dataframe with values as Yes or No based on the condition that if corresponding value in B column is greater than 0 then yes otherwise No.

  A   B    C   D
| 1 | -3 | 4 | No  |
| 2 | 0  | 5 | No  |
| 6 | 6  | 6 | Yes |

I am not able to implement this through PySpark dataframe operations.

ernest_k
  • 44,416
  • 5
  • 53
  • 99
jason_1093
  • 667
  • 4
  • 9
  • 16

1 Answers1

21

Try something like this:

from pyspark.sql import functions as f
df.withColumn('D', f.when(f.col('B') > 0, "Yes").otherwise("No")).show()
1pluszara
  • 1,518
  • 3
  • 14
  • 26