0

I have found this answer Get the row corresponding to the latest timestamp in a Spark Dataset using Scala

"edate" is of date datatype.

I want the similar output using java. I tried this:

java.sql.Date yesterdayDate = yesterday();  
    Dataset<Row> wds = wddt.where(wddt.col("c").equalTo(yesterdayDate)).groupBy("mobileno").max("edate");

but I am getting this error:

org.apache.spark.sql.AnalysisException: "edate" is not a numeric column. Aggregation function can only be applied on a numeric column.;
at org.apache.spark.sql.RelationalGroupedDataset$$anonfun$3.apply(RelationalGroupedDataset.scala:101)
at org.apache.spark.sql.RelationalGroupedDataset$$anonfun$3.apply(RelationalGroupedDataset.scala:98)
devCodePro
  • 97
  • 2
  • 4
  • 17

1 Answers1

0

Guessing from the error message and the naming, "edate" seems to be a date column and not a numeric column. Thats why you get this error message.

see also how to get max(date) from given set of data grouped by some fields using pyspark?

ChristophE
  • 760
  • 1
  • 9
  • 21
  • yes, it is a date column. I want to get the latest record based on edate – devCodePro Feb 22 '19 at 16:20
  • so have you tried the solution given in the answer I linked? .agg(max_("edate"))) – ChristophE Feb 25 '19 at 08:29
  • Yes, I am getting an error max(String) is not defined and after that I create a blank method `private Map max(String string) { return null; }` Help me for the logic which is required under this? – devCodePro Feb 25 '19 at 14:03