5

I want to add a column with a randomly generated id to my Spark dataframe. To do that, I'm using a UDF to call UUID's random UUID method, like so:

def getRandomId(s:String) : String = {
    UUID.randomUUID().toString()
}

val idUdf = udf(getRandomId(_:String))
val newDf = myDf.withColumn("id", idUdf($"colName"))

Obviously, my getRandomId function does not need an input parameter; however, I can't figure out how to create a UDF that does not take in a column as input. Is that possible in Spark?

I am using Spark 1.5

ZygD
  • 22,092
  • 39
  • 79
  • 102
alexgbelov
  • 3,032
  • 4
  • 28
  • 42
  • 2
    Possible duplicate of [Scala and Spark UDF function](http://stackoverflow.com/questions/38633216/scala-and-spark-udf-function) – Yaron Jan 26 '17 at 07:14

2 Answers2

11

you can register udf with no params. Here () => String will solve the requirement

import org.apache.spark.sql.functions.udf
val uuid = udf(() => java.util.UUID.randomUUID().toString)

using the UDF(uuid) on DataFrame

val newDf = myDf.withColumn("uuid", uuid())
mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
1

you can try this:

def getRandomId() : String = {
   UUID.randomUUID().toString()
}

val idUdf = udf(getRandomId _)
val newDf = df.withColumn("id", idUdf())

The trick is getRandomId _ creates a Function () => String out of your method

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145