0

How to check which of the two DAX queries has a better performance using Daxstudio. In the example the queries return exactly the same results. However the stats differ showing not clear hints. What usuful information can we grasp from the comparison of the two queries?

Comparison query stats summary:

+-------------------------+------------+---------+---------+
|                         |            | Query 1 | Query 2 |
+-------------------------+------------+---------+---------+
| Server timings          | Total      |       7 |       5 |
|                         | SE CPU     |       0 |       0 |
|                         | FE         |       6 |       4 |
|                         | SE         |       1 |       1 |
|                         | SE Queries |       3 |       2 |
|                         | SE Cashe   |       0 |       0 |
+-------------------------+------------+---------+---------+
| Query plan, no of lines | physical   |       7 |      28 |
|                         | logical    |      13 |       9 |
+-------------------------+------------+---------+---------+
  • The second query is quicker but has a bushy and longer plan. 2 scans.
  • The first query has longer server timings but cleaner and shorter query plan. 3 scans.

So the server timings favor the second query but its complex query plan raises concern. Knowing the stats and query plans what can we expect if the SearchTable had milion of rows? Shouldn't we like simpler query plans, since DAX optimization may change in the future in their favor?

Sample data. We have two tables SearchTable and ThisTable:

SearchTable = 
DATATABLE (
    "Category", STRING,
    "Product", STRING,
    "Amount", INTEGER,
    {
        { BLANK ()      , "apple"       , 1 },
        { "Fruits"      , "apple"       , 1 },  -- watch out for multiple apples!
        { "Yummy Fruits", "apple"       , 2 },
        { "Fruits"      , "banana"      , 4 },
        { "Fruits"      , "wolfberry"   , 5 },
        { "Fruits"      , "cherry"      , 3 },
        { "Vegetables"  , "carrot"      , 3 },
        { "Vegetables"  , "potato"      , 1 },
        { "Vegetables"  , "onion"       , 7 },
        { "Fruits"      , "cherry"      , 3 }        
    }
)
---
ThisTable = 
DATATABLE (
    "Product", STRING,
    {
        { "apple" },
        { "banana" },
        { "blackberry" },
        { "carrot" },
        { "cherry" },
        { "onion " },
        { "potato" },
        { "watermelon" },
        { "wolfberry" }
    }
)

Query no 1.

EVALUATE
ADDCOLUMNS (
    VALUES ( ThisTable[Product] ),
    "FilterLookup",
    VAR LookupKey = ThisTable[Product]
    RETURN
        CALCULATE ( MAX ( SearchTable[Category] ), SearchTable[Product] = LookupKey )
)

The query has these stats:

enter image description here

And query plan: enter image description here

Query no 2.

EVALUATE
ADDCOLUMNS (
    VALUES ( ThisTable[Product] ),
    "FilterLookup", MAXX (
        FILTER ( SearchTable, SearchTable[Product] = ThisTable[Product] ),
        SearchTable[Category]
    )
)

Stats: enter image description here

The query plan: enter image description here

The question is related to:

DAX lookup first non blank value in unrelated table

You can download a pbix file with sample data:

DAX lookup top 1 value.pbix

Przemyslaw Remin
  • 6,276
  • 25
  • 113
  • 191

3 Answers3

1

It's hard to extrapolate performance from a tiny data set to a large one due to fixed overhead costs, so I'd recommend testing on larger data tables.

In general, you want to avoid iterators like MAXX in favor of MAX when possible as the latter has under-the-hood engine optimizations. Very few rules are universal in optimizing queries so this is a rather opinion-based question given the data you've shown.

Alexis Olson
  • 38,724
  • 7
  • 42
  • 64
  • I would like to find out what useful information can we grasp from Daxstudio. The message from the stats and query plans is not clear for me. I also thought that `MAXX` would be slower than `MAX`. But Daxstudio stats contradict this faith:-) – Przemyslaw Remin Jan 23 '20 at 17:08
  • I'm saying you can't tell a lot from the experiment you've done (unless perhaps you're very good at reading query plans). I'd bet the results would be different with a larger set (`MAX` likely to be faster). – Alexis Olson Jan 23 '20 at 18:35
  • 1
    MAX is internally executing MAXX (if applied to a single column), so here MAXX and MAX will have exactly the same performance. Things are different if there is context transition. https://dax.guide/max/ – RADO Jan 23 '20 at 22:07
  • Thanks for that @RADO. I've tried this with larger datasets too and it does look like the `MAXX` version is faster, though not if I use `FILTER` inside the `CALCULATE` instead of the simple boolean filter (I think that's what makes OP's first one slower.). – Alexis Olson Jan 23 '20 at 22:39
  • Alexis, can you explain you findings? What does it mean: " the MAXX version is faster, though not if I use FILTER inside the CALCULATE instead of the simple boolean filter" ? Maybe put that in your answer, please – Przemyslaw Remin Jan 24 '20 at 09:42
  • Ignore that. I've run tests on your two different versions for larger sets (millions of records for ~10s queries) and the performance is not distinguishable (there's a larger difference between repeat runs of the same query than between the different queries). – Alexis Olson Jan 24 '20 at 15:39
1

You can't really tell from DAX Studio, when the data sets are so small, but in most cases, the query with the least complex query plan will be fastest. This is the case for query #1 and that will indeed be the fastest query in your case (ignore all time measurements below ~20 ms - as it's unreliable, because the data sets are so small).

Also, I'd like to add that the following query should provide the same result, and be even faster, with an even simpler query plan than both of your queries:

ADDCOLUMNS(
    ThisTable,
    "FilterLookup",
        LOOKUPVALUE(SearchTable[Category], SearchTable[Product], ThisTable[Product])
)

Edit: I didn't notice that "apple" appears twice in the SearchTable[Product] column. This will cause the above call to LOOKUPVALUE(...) to fail, since it won't be able to find an unambiguous value for SearchTable[Category].

Dan
  • 10,480
  • 23
  • 49
  • (1) Thank you for that. In most cases when Alberto and Marco (I do not introduce them) demonstrate optimization tricks they simplify query plans and then the time of execution drops. But here are mixed hints. You advice the simplest query plan. What about the number of scans? The second query has only 2 scans while the first has 3 scans. (2) The code you provided won't work for the sample data. It will raise the error: `A table of multiple values was supplied where a single value was expected.` Please see https://stackoverflow.com/q/59310106/1903793 – Przemyslaw Remin Jan 24 '20 at 09:33
  • (1) Minimizing the number of SE queries (scans) is not a goal in itself. It's better to have 10 SE queries that return only very few rows each (late materialization), than a single SE query which returns a large number of rows (early materialization). But again - for a query such as this, over a dataset with only 10 records, it really doesn't matter if you do one thing or the other. – Dan Jan 24 '20 at 09:40
  • (2) I didn't notice that your searchtable has "apple" twice, which is why this is failing. For LOOKUPVALUE to work, it must be able to return an unambiguous result - in other words, the search column should only contain unique values. – Dan Jan 24 '20 at 09:42
  • (2) That is why I put the first initial problem as "Look up for the first non blank value". But it turned out that MAX and MAXX approaches where way faster than solution proposed by Karl Anka here: https://stackoverflow.com/a/59318268/1903793 I started measuring performance with Daxstudio and that is why we are here. – Przemyslaw Remin Jan 24 '20 at 09:48
  • Ok - you should let us know what the query plans and SE queries look like on your original query, which I assume scans a much bigger data set, as it's impossible to conclude anything regarding optimization on the very small sample data set that you're showing in this post. Also - you can't assume that the formula engine will generate the same query plan for a small data set, as it would for a real-life big dataset. There are many internal optimizations that may kick in depending on the actual cardinality and statistics of the data involved. – Dan Jan 24 '20 at 09:54
0

Query 1 has better performance since FILTER scans only a column not an entire table however you can use TREATAS to get a better performance in addition to maintaining the data lineage.