2

i have the following code. df3 is created using the following code.i want to get the minimum value of distance_n and also the entire row containing that minimum value .

enter image description here

 //it give just the min value , but i want entire row containing that min value

enter image description here

for getting the entire row , i converted this df3 to table for performing spark.sql

if i do like this spark.sql("select latitude,longitude,speed,min(distance_n) from table1").show()

//it throws error enter image description here

and if spark.sql("select latitude,longitude,speed,min(distance_nd) from table180").show()

// by replacing the distance_n with distance_nd it throw the error

enter image description here

how to resolve this to get the entire row corresponding to min value

stackoverflow
  • 59
  • 1
  • 1
  • 11

1 Answers1

1

Before using a custom UDF, you have to register it in spark's sql Context.

e.g:

spark.sqlContext.udf.register("strLen", (s: String) => s.length())

After the UDF is registered, you can access it in your spark sql like

spark.sql("select strLen(some_col) from some_table")

Reference: https://docs.databricks.com/spark/latest/spark-sql/udf-scala.html

Constantine
  • 1,356
  • 14
  • 19
  • see the code above i have updated , how we make it custom udf. min is already a function. – stackoverflow Oct 08 '18 at 10:10
  • The error that you are getting is for distance_nd function. Have you registered it ? If yes, is it present under default DB ? You can check its presence by running **show functions** in your spark sql – Constantine Oct 08 '18 at 11:20