2

I am using spark 2.1 and scripting is pyspark. Please help me with this as I am stuck up here .

Problem statement: To create new columns based on conditions on multiple columns

Input dataframe is below

FLG1 FLG2 FLG3

T     F     T

F     T     T

T     T     F

Now I need to create one new column as FLG and my conditions would be like if FLG1==T&&(FLG2==F||FLG2==T) my FLG has to be T else F

Considered above dataframe as DF

below is my code snippet which was tried

DF.withColumn("FLG",DF.select(when(FLG1=='T' and (FLG2=='F' or FLG2=='T','F').otherwise('T'))).show()

Didn't work I was getting name when is not defined

Please help me in crossing this hurdle

Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
user3292373
  • 483
  • 3
  • 8
  • 25

1 Answers1

4

Try the following, it should work

from pyspark.sql.functions import col, when, lit
DF.withColumn("FLG", when((col("FLG1")=='T') & ((col("FLG2")=='F') | (col("FLG2")=='T')),lit('F')).otherwise(lit('T'))).show()
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
  • Thans for quick reply Ramesh. Getting unsupported operand type for |: 'str' and 'Dataframe' what could be the issue – user3292373 Aug 23 '17 at 17:10
  • my pleasure @user3292373 :) and thanks for upvote and acceptance – Ramesh Maharjan Aug 23 '17 at 17:42
  • Ramesh one more help if the columns are increasing and conditions are increasing now . I need to create an udf that generates one column in that udf I need to give conditions on the columns that I pass as parameter how do I do it .On passed parameters columns the conditions are perfrmed and returned as true or false . How do I do it – user3292373 Aug 23 '17 at 18:09
  • you can get ideas from https://stackoverflow.com/questions/42540169/pyspark-pass-multiple-columns-in-udf – Ramesh Maharjan Aug 23 '17 at 18:16