0

I have a table like the following:

date       | person | assignment
-----------+--------+-----------
2019-01-01 | joe    | blue
2019-02-01 | joe    | red
2019-01-01 | mel    | green
2019-03-01 | mel    | yellow
2019-02-01 | drew   | blue

And I want the following result with a Spark query

latest     | person | assignment
-----------+--------+-----------
2019-02-01 | joe    | red
2019-03-01 | mel    | yellow
2019-02-01 | drew   | blue

I know I could just do person, max(date) latest and then query the full list again. But is there an easy way to do this with a Spark windowing function.

swdev
  • 2,941
  • 2
  • 25
  • 37

1 Answers1

1
df.withColumn(
    "num",
    row_number() over Window.partitionBy("person").orderBy(desc("date")))
  .filter($"num" === 1)
  .drop("num")
  .show()
ollik1
  • 4,460
  • 1
  • 9
  • 20