Check this out
scala> val s1:Seq[(Option[Int],String)] = Seq( (Some(1),"a"), (null,"b"), (null,"c"))
s1: Seq[(Option[Int], String)] = List((Some(1),a), (null,b), (null,c))
scala> val df = s1.toDF("id","name")
df: org.apache.spark.sql.DataFrame = [id: int, name: string]
scala> val epoch = java.time.Instant.now.getEpochSecond
epoch: Long = 1539084285
scala> df.withColumn("id",when( $"id".isNull,epoch).otherwise($"id")).show
+----------+----+
| id|name|
+----------+----+
| 1| a|
|1539084285| b|
|1539084285| c|
+----------+----+
scala>
EDIT1:
I used milliseconds, then also I get same values. Spark doesn't capture nano seconds in time portion. It is possible that many rows could get the same milliseconds. So your assumption of getting unique values based on epoch would not work.
scala> def getEpoch(x:String):Long = java.time.Instant.now.toEpochMilli
getEpoch: (x: String)Long
scala> val myudfepoch = udf( getEpoch(_:String):Long )
myudfepoch: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,LongType,Some(List(StringType)))
scala> df.withColumn("id",when( $"id".isNull,myudfepoch('name)).otherwise($"id")).show
+-------------+----+
| id|name|
+-------------+----+
| 1| a|
|1539087300957| b|
|1539087300957| c|
+-------------+----+
scala>
The only possibility is to use the monotonicallyIncreasingId, but that values may not be of same length all the time.
scala> df.withColumn("id",when( $"id".isNull,myudfepoch('name)+monotonicallyIncreasingId).otherwise($"id")).show
warning: there was one deprecation warning; re-run with -deprecation for details
+-------------+----+
| id|name|
+-------------+----+
| 1| a|
|1539090186541| b|
|1539090186543| c|
+-------------+----+
scala>
EDIT2:
I'm able to trick the System.nanoTime and get the increasing ids, but they will not be sequential, but the length can be maintained. See below
scala> def getEpoch(x:String):String = System.nanoTime.toString.take(12)
getEpoch: (x: String)String
scala> val myudfepoch = udf( getEpoch(_:String):String )
myudfepoch: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
scala> df.withColumn("id",when( $"id".isNull,myudfepoch('name)).otherwise($"id")).show
+------------+----+
| id|name|
+------------+----+
| 1| a|
|186127230392| b|
|186127230399| c|
+------------+----+
scala>
Try this out when running in clusters and adjust the take(12), if you get duplicate values.