I have a dataset having some colors and counts with related dates.
+-----------+----------+-----+
| color| Date|count|
+-----------+----------+-----+
| red|2014-05-26| 5|
| red|2014-05-02| 1|
| red|2015-04-02| 1|
| red|2015-04-26| 1|
| red|2015-09-26| 2|
| blue|2014-05-26| 3|
| blue|2014-06-02| 1|
| brown|2014-07-31| 2|
| green|2014-08-01| 2|
+-----------+----------+-----+
I want max count for each colors with related dates. I am using Spark 2.0.2 with Java 8.
when I used max function then it removed date column and when I put date into groupBy
then it gives same table as input dataset.
df.groupBy(color).max("count").show();
+-----------+----------+
|color |max(count)|
+-----------+----------+
| red| 5|
| blue| 3|
| brown| 2|
| green| 2|
+-----------+----------+
Expected output:
+-----------+----------+----------+
|color | date|max(count)|
+-----------+----------+----------+
| red|2014-05-26| 5|
| blue|2014-05-26| 3|
| brown|2014-07-31| 2|
| green|2014-08-01| 2|
+-----------+----------+----------+