Get latest records from cassandra table by spark using java

Question

I have found this answer Get the row corresponding to the latest timestamp in a Spark Dataset using Scala

"edate" is of date datatype.

I want the similar output using java. I tried this:

java.sql.Date yesterdayDate = yesterday();  
    Dataset<Row> wds = wddt.where(wddt.col("c").equalTo(yesterdayDate)).groupBy("mobileno").max("edate");

but I am getting this error:

org.apache.spark.sql.AnalysisException: "edate" is not a numeric column. Aggregation function can only be applied on a numeric column.;
at org.apache.spark.sql.RelationalGroupedDataset$$anonfun$3.apply(RelationalGroupedDataset.scala:101)
at org.apache.spark.sql.RelationalGroupedDataset$$anonfun$3.apply(RelationalGroupedDataset.scala:98)

score 0 · Answer 1 · answered Feb 22 '19 at 15:33

0

Guessing from the error message and the naming, "edate" seems to be a date column and not a numeric column. Thats why you get this error message.

see also how to get max(date) from given set of data grouped by some fields using pyspark?

answered Feb 22 '19 at 15:33

ChristophE

760
1
9
21

yes, it is a date column. I want to get the latest record based on edate – devCodePro Feb 22 '19 at 16:20
so have you tried the solution given in the answer I linked? .agg(max_("edate"))) – ChristophE Feb 25 '19 at 08:29
Yes, I am getting an error max(String) is not defined and after that I create a blank method `private Map max(String string) { return null; }` Help me for the logic which is required under this? – devCodePro Feb 25 '19 at 14:03

Get latest records from cassandra table by spark using java

1 Answers1