1

I have DataFrame with columns (A, B, C, D, E). There are some rows with wrong values for C and D. I have another Map which has correct information (A -> (C, D)). How to correct values for column C and D?

I know we can use withColumn method to update value of one column. So I used withColumn twice to update two columns.

fixCUdf(A: Long, C: Long): Long = {
  if (newValuesMap.contains(A))
    newValuesMap(A)._1
  else
    C
}

fixDUdf(A: Long, D: Long): Long = {
  if (newValuesMap.contains(A))
    newValuesMap(A)._2
  else
    D
}

dataFrame.withColumn("C", fixCUdf(col("A"), col("C")))

dataFrame.withColumn("D", fixCUdf(col("A"), col("D")))

Is there a better way of doing this? Where I don't have to call fixXUdf multiple times.

0 Answers0