I have a curious question about dataframe, I saw this code
df.
withColumn("emailF", trim($"email")).
withColumn("emailF", regexp_replace($"emailF", " +", "")).
withColumn("emailF", lower($"emailF"))
But I said that I prefer to use udf function to apply all rules to format email, a udf like this:
val customUdf = udf((txt:String) => {
txt.toLower.trim.replaceAll(" +","")
//another logic
})
my question is:
What is better?
use multiple withColumn
in the same column to apply multiple functions or use a one function and apply all rules inside.
thanks for your answers and suggestions.