I have a Hive table with account numbers and most recent updated dates. Not every account is updated each day, so I can't simply select all records from a certain day. I need to group by account number and then sort in descending order to take the most recent 2 days for each account. My script so far:
sc.setLogLevel("ERROR")
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
import org.apache.spark.sql.functions._
import sqlContext.implicits._
val df1 = sqlContext.sql("FROM mydb.mytable SELECT account_num, last_updated")
val DFGrouped = df1.groupBy("account_num").orderBy(desc("data_dt"))
I'm getting error on the orderBy:
value orderBy is not a member of org.apache.spark.sql.GroupedData
Any idea on what I should be doing here?