I'm trying to create a flag in my dataset based on 2 conditions, the first is simple. Does CheckingCol = CheckingCol2.
The second is more complicated. I have a column called TranID and a column called RevID.
For nay row if RevID is in TranID AND CheckingCol = CheckingCol2 then the flag should return "Yes". Otherwise the flag should return "No".
My data looks like this:
TranID RevID CheckingCol CheckingCol2
1 2 ABC ABC
2 1 ABC ABC
3 6 ABCDE ABCDE
4 3 ABCDE ABC
5 7 ABCDE ABC
The expected result would be:
TranID RevID CheckingCol CheckingCol2 Flag
1 2 ABC ABC Yes
2 1 ABC ABC Yes
3 6 ABCDE ABCDE No
4 3 ABCDE ABC No
5 7 ABCDE ABC No
I've tried using:
df.withColumn("TotalMatch", when((col("RevID").contains(col("TranID"))) & (col("CheckingColumn") == col("CheckingColumn2")), "Yes").otherwise("No"))
But it didn't work, and I've not been able to find anything online about how to do this.
Any help would be great!