I have DataFrame with columns (A, B, C, D, E). There are some rows with wrong values for C and D. I have another Map which has correct information (A -> (C, D)). How to correct values for column C and D?
I know we can use withColumn method to update value of one column. So I used withColumn twice to update two columns.
fixCUdf(A: Long, C: Long): Long = {
if (newValuesMap.contains(A))
newValuesMap(A)._1
else
C
}
fixDUdf(A: Long, D: Long): Long = {
if (newValuesMap.contains(A))
newValuesMap(A)._2
else
D
}
dataFrame.withColumn("C", fixCUdf(col("A"), col("C")))
dataFrame.withColumn("D", fixCUdf(col("A"), col("D")))
Is there a better way of doing this? Where I don't have to call fixXUdf multiple times.