1

I came across the following in some code that I was refactoring today.

context.Entities.Where(x => x.ForeignKeyId == id)
    .OrderBy(x => x.FirstSortField)
    .OrderBy(x => x.SecondSortField);

Initially, I took out the .OrderBy(x => x.FirstSortField) thinking that the first OrderBy statement would just be replaced by the second OrderBy. After testing, I realized that it was generating the SQL as ORDER BY SecondSortField, FirstSortField.

Therefore, the equivalent is actually:

context.Entities.Where(x => x.ForeignKeyId == id)
    .OrderBy(x => x.SecondSortField)
    .ThenBy(x => x.FirstSortField);

Can anyone explain the reasoning behind EF6 doing this? It seems to me that it would be more intuitive to replace the first sort field with the second.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
Wyatt Earp
  • 1,783
  • 13
  • 23
  • 1
    The first `OrderBy` *is* replaced by the second. Your "equivalent" is not an equivalent. – Gert Arnold Jan 26 '21 at 22:18
  • @GertArnold, That's what I initially thought it would do, until I started testing. If you try this in LinqPad, for example, the SQL it generates for these two statements is equivalent. – Wyatt Earp Jan 26 '21 at 22:19
  • Of course that's what I did. EF6, EF core 3. Sql Server. Maybe another database provider is in play? – Gert Arnold Jan 26 '21 at 22:25
  • @GertArnold It's not a simple replacement if you're sorting data with lots of ties on the later field. – Joel Coehoorn Jan 26 '21 at 22:31
  • 1
    @JoelCoehoorn It's not LINQ-to-objects. The query is very simply translated into one ORDER BY statement. Is see it happen right in front of me. – Gert Arnold Jan 26 '21 at 22:39
  • 1
    @WyattEarp Since you seem to actually see the generated `ORDER BY` as you show it, it makes me wonder what the defining factor is here. Are you in SQL server too? Is the actual query you based this upon different in some aspect that didn't seem important? Really, it's not that I don't believe you, it's this difference that fascinates me. – Gert Arnold Jan 26 '21 at 22:54
  • Please see my comment on user1344783's answer. Are you sure you're using EF as data provider? – Gert Arnold Jan 27 '21 at 08:11
  • @GertArnold, I am trying it in LinqPad 5 (EF 6) and LinqPad 6 (EF Core) and seeing the same behavior in both. The database I'm testing against is SQL Server 2016. I also simplified the query that I'm using down to bare minimum. I'm doing var d = `Dashboards.OrderBy(o => o.Name).OrderBy(o => o.DisplayName).ToList();`, where Dashboards is a table in the database. The SQL generated is `SELECT [t0].[DashboardId], ... FROM [UI].[Dashboard] AS [t0] ORDER BY [t0].[DisplayName], [t0].[Name]` – Wyatt Earp Jan 27 '21 at 12:38
  • 1
    `[t0]` is the LINQ-to-SQL prefix. EF generates other prefixes. – Gert Arnold Jan 27 '21 at 12:41

3 Answers3

3

This is all in terms of local data, but EF wants to do the logical equivalent when building the expression tree and writing your query.

You should research the concept of stable sorting. When you use a stable sorting algorithm, the original order of equal items is preserved.

So let's say you have data like this with obvious first name/last name fields:

Brad Jones
Tom Smith
Sam Jones
Jim Doe
James Smith
Ryan Smith

If you order initially by only by first name, you get this:

Brad Jones
James Smith
Jim Doe
Ryan Smith
Sam Jones
Tom Smith

If you now take this sorted list, and again sort by last name, you get a result sorted by both fields, where later sorts have precedence over earlier sorts... but you only guarantee the exact order if the sort is stable:

Jim Doe
Brad Jones
Sam Jones
James Smith
Ryan Smith
Tom Smith

This brings us to the question of what algorithm .Net uses, and whether it's stable. To the documentation we go, where we find this in the Remarks section:

This method performs a stable sort

The specific algorithm is not documented here. I believe it's a Quicksort, but leaving that out of the documentation is probably intentional, to allow the maintainers to update for the best available option that meets the stability requirement should something better be discovered.

But, again, that's for local data. Databases will do what the SQL tells them.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • The docs also state: _"Because IOrderedEnumerable inherits from IEnumerable, you can call OrderBy or OrderByDescending on the results of a call to OrderBy, OrderByDescending, ThenBy or ThenByDescending. **Doing this introduces a new primary ordering that ignores the previously established ordering**."_ The "stable" property indeed preserves the order of equal items in the second sort, so the result looking like the sort is inverted in regards to OrderBy calls is dependent on the contents of the data being sorted. – CodeCaster Jan 26 '21 at 22:33
  • @CodeCaster That statement is misleading. In my examples, after the first sort Brad Jones comes before Jim Doe. I believe what they're trying to say is calling OrderBy() a second time can override that, which we see in the second example, where Brad Jones comes after. It ignores the order in that it pays no special attention, but because the order is there in the IOrderedEnumerable, you get predictable results. And if this were linq to entities, you might have a point about using `.ThenBy()`. But for EF, this will be translated to SQL and do what the OP wants. – Joel Coehoorn Jan 26 '21 at 22:35
3

I can only conclude that we're actually looking at LINQ-to-SQL here. In Linqpad until v. 5, it's very easy to make this mistake, because the selection of a EF6 DbContext driver is easily overlooked when creating a new connection. (In Linqpad v6 this choice is more conspicuous).

I have tested the reported behavior in EF6, EF-core 3 and 5, and in LINQ-to-SQL. Only in the latter do I see a generated SQL statement with two columns in the ORDER BY.

The statement...

Products.OrderBy(p => p.Description).OrderBy(p => p.LastSale)

...is translated by LINQ-to-SQL as:

SELECT [t0].[ID], [t0].[Description], [t0].[Discontinued], [t0].[LastSale]
FROM [Product] AS [t0]
ORDER BY [t0].[LastSale], [t0].[Description]

The reasoning is explained in this answer, which boils down to: LastSale is the dominant ordering field as it, sort of, overrides the first OrderBy.

All EF queries only have ORDER BY LastSale.

I must say agree with EF's implementation. As this answer explains, the result of two consecutive orderings depends on the sorting algorithm. Which means that all we can say for sure is that the results from any LINQ query will be ordered by LastSale and the ordering within LastSale groups isn't certain. Then, IMO, it's a better choice for SQL translation to handle a second OrderBy statement as a full override of the first one so it's visible that no expectations can be based on the first statement. To me it's more intuitive.

The message is: be explicit when ordering by multiple fields. Use OrderBy - ThenBy. Don't rely on a database provider's handling of consecutive OrderBy statements.

Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
  • 1
    I too agree with EF's implementation, due to the nature of deferred execution. I see that if I change to a custom typed datacontext, using EF6, in LinqPad, I see the expected behavior. Thanks for helping me understand! – Wyatt Earp Jan 27 '21 at 13:07
2

Actually, it is quite simple and it makes total sense:

The first part of the query

context.Entities.Where(x => x.ForeignKeyId == id)

would be translated to SQL more or less like this

select * from Entities

Adding the first order by

.OrderBy(x => x.FirstSortField)

it would be translated to this

select * from (
   select * from Entities
)
order by FirstSortField

and then adding the second order by

.OrderBy(x => x.FirstSortField)
.OrderBy(x => x.SecondSortField)

would be translated to:

select * from (
   select * from (
      select * from Entities
   )
   order by FirstSortField
)
order by SecondSortField

Entity framework is smart enough to simplify to something like

select * from Entities
order by SecondSortField, FirstSortField
Hayden
  • 2,902
  • 2
  • 15
  • 29
jalepi
  • 184
  • 3
  • 1
    A bit too smart, as semantically only `order by SecondSortField` should remain. The subquery is fundamentally unordered without a `TOP` – Charlieface Jan 26 '21 at 22:08
  • No, Entity framework is smart enough to just ditch the first `OrderBy`. Please check things like this before posting. – Gert Arnold Jan 26 '21 at 22:14
  • @GertArnold, you should check yourself. EF used to "override" the orderbys, but currently it composes it. Try it yourself and see with your eyes if you don't believe. – jalepi Jan 26 '21 at 22:31
  • 1
    Well, maybe I'm crazy, but I see two chained OrderBy's translated into one `ORDER BY` in SQL. Both with EF6 (which the question is about) and EF core 3. – Gert Arnold Jan 26 '21 at 22:41
  • @GertArnold are you sure (you are using different properties each order by :D? I have LinqPad open with EF6 and it translates both order bys in .Net Core 3, 3.1, and 5.0. – jalepi Jan 26 '21 at 22:58
  • Yep. The only difference I can think of is the database provider. Mine is Sql Server. The query is a simple DbSet + Where + 2 OrderBy's. – Gert Arnold Jan 26 '21 at 23:01
  • That could be the difference, I used LINQPad DemoDB.sdf with SQL CE 4.0 Provider – jalepi Jan 26 '21 at 23:14
  • Looks like you're using LINQ-to-SQL then, which is the default Linqpad (5) data provider. – Gert Arnold Jan 27 '21 at 08:15