1

Just wondering about some performance question for accessing random entries in SQL Server with Entity Framework.

The common solution recommended in many places I've found is to use

db.Table.OrderBy(x => Guid.NewGuid()).FirstOrDefault()

What I'm wondering about is if it would be more efficient to instead use

var skip = random.Next(0, db.Table.Count());
db.Table.OrderBy(x => x.Id).Skip(skip).FirstOrDefault()

Since the first alternative relies reordering every entry in the table randomly (unless SQL Server has some special optimization for this?) while the second should retrieve the number of items (which I believe runs in constant time) and then only retrieving a single element.

Kanadaj
  • 962
  • 9
  • 25
  • 4
    I just want to highlight that your second solution does two queries (`Count()` and `FirstOrDefault()`), hitting the DB two times. Also: when in doubt about performance, just benchmark it. – Federico Dipuma May 24 '17 at 10:37
  • var skip = random.Next(0, db.Table.Count()); db.Table.OrderBy(x => x.Id).Skip(skip).FirstOrDefault() => This statement will give you a better performance than first one – Hieu Le May 24 '17 at 10:39
  • You should use the approach shown in this link(BINARY_CHECKSUM ): https://msdn.microsoft.com/en-us/library/cc441928.aspx I don't think it can be applied to LINQ directly but you can create a stored-procedure and [use](https://stackoverflow.com/questions/17589745/how-to-use-stored-procedure-in-linq-to-entities) it in L2Entities – Tim Schmelter May 24 '17 at 10:45
  • 1
    Eric Lippert wrote a good article entitled [Which is faster?](https://ericlippert.com/2012/12/17/performance-rant/) You may wish to read it. – Damien_The_Unbeliever May 24 '17 at 10:45
  • Not sure about that, but doesn't `Count()` just execute a query with a full-table scan? If yes, this is not in constant time at all. – Federico Dipuma May 24 '17 at 10:51
  • "which I believe runs in constant time" -- you believe incorrectly. Counting rows is a transactional operation like any other and requires enumerating rows. Even if it *was* a constant time operation, `.Skip()` still isn't (though it's fast enough with an index). There are constant time ways of estimating the row count of a table using the system views, but `.Count()` doesn't use those. It is still true that one approach may be faster than another, but there benchmarking will help you out -- then you don't have to believe. – Jeroen Mostert May 24 '17 at 10:51
  • @Damien_The_Unbeliever Benchmarks in databases can be extremely volatile due to the number of possible variables involved, hence my attempt to ask here to see if there is someone who knows how it works under the hood. – Kanadaj May 24 '17 at 11:24
  • @JeroenMostert Isn't COUNT calculated using table size / row size for single tables? Assuming the database stores the table size (which it might not, but sounds like a good idea for optimisation considering how often it may be needed). But again, I am no DBMS engineer. Though, even if it's not constant, COUNT should be faster than a random sort, but the query optimizer might be doing stuff I am not aware of that requires special conditions. Not being aware of said conditions are the reason I am asking here instead of benchmarking. – Kanadaj May 24 '17 at 11:25
  • @Dracor - you're aware that there are a huge number of variables that can affect the performance, and yet *not mentioned any of them* in the question? – Damien_The_Unbeliever May 24 '17 at 11:26
  • @TimSchmelter Thanks, that is actually helpful. I was not aware of the usage of BINARY_CHECKSUM for this – Kanadaj May 24 '17 at 11:26
  • @Damien_The_Unbeliever The whole reason I ask is that I am most likely not aware of all the potential variables, only the chance that they exist. If I knew about all the variables and their effect on the performance, I wouldn't need to ask, would I? – Kanadaj May 24 '17 at 11:27
  • @DanielRusznyak: "Isn't COUNT calculated using table size / row size for single tables?" Nope. If you're interested, issue `SET STATISTICS IO ON` and do a `SELECT COUNT(*) FROM table`, or view an execution plan. You'll see a table/index scan. For starters, it's not possible to do it by dividing simple numbers because row size isn't a constant for most tables. – Jeroen Mostert May 24 '17 at 11:41
  • @JeroenMostert That makes sense, though what about indices on fixed length fields, i.e. foreign keys? – Kanadaj May 24 '17 at 11:45
  • Still no. If you want to ask "Is COUNT(*) in SQL Server a constant time operation? If not, why not?", feel free to actually ask it as a separate question. The closest I could find was [this](https://stackoverflow.com/questions/18807304/), but that's specific to SQLite. – Jeroen Mostert May 24 '17 at 11:50
  • @JeroenMostert Thanks for the answers, it always helps to have others fix one's misconceptions :). Also, the DB does actually do some magic if it can use an index scan, though that could be something completely different than the fixed row size, but might be related as well. – Kanadaj May 24 '17 at 12:16
  • This is now a [separate question](https://stackoverflow.com/questions/44158023/), and deservedly so. – Jeroen Mostert May 24 '17 at 12:20

0 Answers0