6

How to select row with max value per category in M of Power BI. Suppose we have table:

+----------+-------+------------+
| Category | Value |    Date    |
+----------+-------+------------+
| apples   |     1 | 2018-07-01 |
| apples   |     2 | 2018-07-02 |
| apples   |     3 | 2018-07-03 |
| bananas  |     7 | 2018-07-04 |
| bananas  |     8 | 2018-07-05 |
| bananas  |     9 | 2018-07-06 |
+----------+-------+------------+

Desired results are:

+----------+-------+------------+
| Category | Value |    Date    |
+----------+-------+------------+
| apples   |     3 | 2018-07-03 |
| bananas  |     9 | 2018-07-06 |
+----------+-------+------------+

Here is a start table for PBI:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Category", type text}, {"Value", Int64.Type}, {"Date", type date}})
in
    #"Changed Type"

I wonder if there is a way to come to desired results in subsequent steps within only one table, by adding some magic column IsMax:

+----------+-------+------------+-------+
| Category | Value |    Date    | IsMax |
+----------+-------+------------+-------+
| apples   |     1 | 2018-07-01 |     0 |
| apples   |     2 | 2018-07-02 |     0 |
| apples   |     3 | 2018-07-03 |     1 |
| bananas  |     7 | 2018-07-04 |     0 |
| bananas  |     8 | 2018-07-05 |     0 |
| bananas  |     9 | 2018-07-06 |     1 |
+----------+-------+------------+-------+
Przemyslaw Remin
  • 6,276
  • 25
  • 113
  • 191

3 Answers3

7

Doing a basic Group By in the Power Query Editor (group by Category and take the max over Value) gets you this table:

+----------+-------+
| Category | Value |
+----------+-------+
| apples   |     3 |
| bananas  |     9 |
+----------+-------+

Add a custom column IsMax that is simply the value 1 to this table and then merge (left outer join) it with your original table matching on both Category and Value. Finally, expand the IsMax column to get your desired table, except with null instead of 0. You can replace the null values if you choose.

Here's the M code for all those steps:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Value", Int64.Type}, {"Date", type date}, {"Category", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Category"}, {{"Value", each List.Max([Value]), Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "IsMax", each 1, Int64.Type),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Category", "Value"},#"Added Custom",{"Category", "Value"},"Added Custom",JoinKind.LeftOuter),
    #"Expanded Added Custom" = Table.ExpandTableColumn(#"Merged Queries", "Added Custom", {"IsMax"}, {"IsMax"})
in
    #"Expanded Added Custom"

Edit: A slightly simplified version to reproduce the "desired result" rather than the IsMax version:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Value", Int64.Type}, {"Date", type date}, {"Category", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Category"}, {{"Value", each List.Max([Value]), Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Grouped Rows", {"Category", "Value"}, #"Changed Type", {"Category", "Value"}, "Grouped Rows", JoinKind.LeftOuter),
    #"Expanded Grouped Rows" = Table.ExpandTableColumn(#"Merged Queries", "Grouped Rows", {"Date"}, {"Date"})
in
    #"Expanded Grouped Rows"

Edit 2: @user11632362 pointed me at another solution that's even fewer steps.

Everything happens in the Group By step.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Value", Int64.Type}, {"Date", type date}, {"Category", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Category"}, {{"Value", each List.Max([Value]), Int64.Type}, {"Date", each Table.Max(_, "Value")[Date], type date}})
in
    #"Grouped Rows"

The key here is each Table.Max(_, "Value")[Date]. This sorts the subtable by Value and returns the top row of the result as a record (and the [Date] suffix returns the value in the Date field of that record).

Note that this only pulls over a single column, Date. If you need to pull in a bunch of columns it might make more sense to return the full record and expand all the desired fields in another step rather than adding more columns to the Group By step.

For example,

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6hAolKsDpIaI2Q1RlCBJFQ1xshqjKECyWA1SYl5QAhSZI6syATIAeEUNEUWyIpMgRwQTkVTZImsyAzIAeE0pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t, Col1 = _t, Col2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Category", type text}, {"Value", Int64.Type}, {"Date", type date}, {"Col1", Int64.Type}, {"Col2", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Category"}, {{"Value", each List.Max([Value]), type nullable date}, {"TopValueRow", each Table.Max(_, "Value"), type record}}),
    #"Expanded TopValueRow" = Table.ExpandRecordColumn(#"Grouped Rows", "TopValueRow", {"Date", "Col1", "Col2"}, {"Date", "Col1", "Col2"})
in
    #"Expanded TopValueRow"
Alexis Olson
  • 38,724
  • 7
  • 42
  • 64
2

I ended up getting MAX per category through index. Idea described here: https://stackoverflow.com/a/51498237/1903793

Approach #1 is one-liner snipped in R transformation:

library(dplyr)
output <- dataset %>% group_by(Category) %>% mutate(row_no_by_category = row_number(desc(Date)))

Approach #2, done completely in PBI:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Grouped rows" = Table.Group(Source, {"Category"}, {{"NiceTable", each Table.AddIndexColumn(Table.Sort(_,{{"Date", Order.Descending}} ), "Index",1,1), type table}} ),
    #"Expanded NiceTable" = Table.ExpandTableColumn(#"Grouped rows", "NiceTable", {"Value", "Date", "Index"}, {"Value", "Date", "Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded NiceTable", each ([Index] = 1))
in
    #"Filtered Rows"
Przemyslaw Remin
  • 6,276
  • 25
  • 113
  • 191
1

Another approach is to use the Remove Duplicates functionality but this requires the data to be sorted properly first so that the first row that appears is the correct one to select.

For technical reasons (see StackOverflow posts 1, 2, 3, and this article), we need to buffer the table into memory to make sure the sorting "sticks".

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Value", Int64.Type}, {"Date", type date}, {"Category", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Category", Order.Ascending}, {"Value", Order.Descending}}),
    #"Removed Duplicates" = Table.Distinct(Table.Buffer(#"Sorted Rows"), {"Category"})
in
    #"Removed Duplicates"

This can all be done in the GUI except for editing in the Table.Buffer wrapper in the last step.

Alexis Olson
  • 38,724
  • 7
  • 42
  • 64
  • So in the last step we just click out remove duplicates and then wrap the table with Table.Buffer? – Przemyslaw Remin Nov 15 '21 at 09:58
  • Yes. That’s correct. – Alexis Olson Nov 15 '21 at 14:52
  • It is a neat solution. I wonder what might be the drawbacks. Possibly this buffering might require lots of RAM? BTW. I noticed that ordinary Remove Duplicates, without buffering, leaves the last items which were before Sort step. So the sort step is entirely omitted and the rows are taken were the unique values appeared last. – Przemyslaw Remin Nov 15 '21 at 17:24
  • 1
    It does break query folding and buffers the table into memory, so it's not a good solution for a large query that should be performed at the source. Without the sorting step, you can't really control which rows stay and which go. You may be interested this article I wrote on this topic: https://community.powerbi.com/t5/Community-Blog/Select-Distinct-Rows-Ordered-by-Another-Column-Power-Query/ba-p/2141505 – Alexis Olson Nov 15 '21 at 17:39