1

I am calling a udf on a specific column of my dataframe, in which I check if all values are valid as per a specified date format.

sourcefile = sourcefile.withColumn(column, DateConversion(col(column)))

Here DateConversion is my udf. My question here is that, is there a way by which I can pass the valid date format "yyyy/MM/dd" as a string to this udf which can be used internally in the udf for validation purposes.

I was trying

sourcefile = sourcefile.withColumn(column, DateConversion(col(column),"yyyy/MM/dd"))

But this gives and error.

zero323
  • 322,348
  • 103
  • 959
  • 935
Osy
  • 115
  • 1
  • 6
  • 22
  • [this](https://stackoverflow.com/questions/44361332/add-number-of-days-column-to-date-column-in-same-dataframe-for-spark-scala-app) will give you a good start. – Ramesh Maharjan Jun 05 '17 at 13:55

2 Answers2

6

You can use lit function to create a literal column and pass to the udf.

def udfName = udf((name: String, value:String) => {
      name + value
    })

Use lit() function while calling udf:

dataframe.withColumn("colName", udfName($"firstName", lit("xyz")))
Davis Broda
  • 4,102
  • 5
  • 23
  • 37
koiralo
  • 22,594
  • 6
  • 51
  • 72
5

You can just curry the udf, passing in the date format - or really any other argument you want - when the udf is created.

def getUdf(format: String) = udf{date: String =>
  /*some logic that uses format*/}

And then call that method like so

val dateConversion = getUdf("yyyy/MM/dd")
sourcefile = sourcefile.withColumn(column, dateConversion(col(column)))

This will also allow you to easily swap out the date conversion format, by changing the argument passed to getUdf, rather than the difficulty if the format is hardcoded inside the udf

Davis Broda
  • 4,102
  • 5
  • 23
  • 37