EF Query containing "Distinct()" slow first time

Question

Having trolled the net many discussions cover aspects of my problem but I fail to see a pattern that applies here:

I have a basic .NET WinForm application using EF 3.5 with MySQL. The application works fine with most queries, but any query containing Distinct() is very slow or causes the "The timeout period elapsed prior to completion of the operation or the server is not responding." exception the first time it is run after a reboot of the machine. I.e. when restarting the aplication, even after stopping and re-starting the MySQL Service the Distinct() queries work fine from now on (i.e. taking less than 1 second).

The fact that I cannot reproduce the problem by re-running the application or re-starting MySQL rules out that EF compilation or MySQL caching is a factor. Only after rebooting the problem re-occurs. Makes me think it's perhaps a .NET initialization issue or something else related to the OS. The problem has been observed on XP and Win 7.

Any ideas are appreciated. (Also ideas on trying to reproduce the problem without re-booting the machine, which is highly impractical :-)

Update:

The query generated by EF looks like this:

SELECT Distinct1.C2 FROM (
    SELECT DISTINCT YEAR(Extent1.RecDate) AS C2 
    FROM dailyrecord AS Extent1
    WHERE (Extent1.STN = 430030) AND (Extent1.WBAN = 99999)
) AS Distinct1;

Running this in the mysql shell takes around 30s the first time, around 1s all subsequent times even after re-starting mysql. Back to 30s after rebooting. It appears the OS is reading the entire db file the first time, but not thereafter (very little HD activity on subsequent queries, even with modified query parameters).

How can I flush the OS's read cache to test my theory?

Prehaps the first `Distinct()` lets the DBMS spin up and cache all its data to do some kind of table scan, after which it remains in memory and thus fast. Can you intercept the plain query and execute it after a reboot to see if the slowness then still occurs? You can then identify the source of the problem with more accuracy. — CodeCaster, Nov 19 '12 at 12:03
@CodeCaster: Hmmm, the OP says that restarting MySQL service doesn't affect slow fetching, only full reboot does. AFAIK restarting MySQL service should cause it to drop all in-memory caches, both data and execution plans. — Boris B., Nov 19 '12 at 13:33
Have you tries tracing and manually executing the SQL query generated by EF? Take a look [here](http://stackoverflow.com/questions/1412863/how-do-i-view-the-sql-generated-by-the-entity-framework) — Boris B., Nov 19 '12 at 13:38
@BorisB. I assumed OP meant rebooting the MySQL server, you are right. If only the application is slow after a computer reboot, you'll want to look into the network configuration. — CodeCaster, Nov 19 '12 at 13:47
Thx 4 the ideas. Will try and run the raw query. Just reproducing the problem scenario (i.e. freshly booted machine) is such a pain :-( As for networking I looked into the well-known skip_name_resolve issue with MySQL, but I don't think it aplies to localhost - or does it? (Both client and server run on the local machine.) Is there a way to _reset_ the .Net framework or aspects of it, e.g. memory or DLL managment in order to narrow it down? Thx. Matthias — Matthias, Nov 20 '12 at 03:59

Matthias · Answer 1 · 2012-11-30T22:11:22.983

After quite a bit of testing this is what emerged:

A number of queries are by definition quite slow in MySQL and possibly (m)any DBMS, because they require the entire data file to be read, like the "DISTINCT" query above, where the "WHERE" clause included part of the primary key, but not the entire primary key (which in this case was a composite of 4 fields).
If slow queries are a frequent and an annoyance in an application one should add indexes on the search criteria ("WHERE" fields). This will speed up the query drastically (and was ultimately the solution to the originally posted problem), but it will increase the size of the data on disk (.MYI file) while slighlty decreasing the size of the .MYD file, but overall increase the total data size. Adding indexes may increase the time for INSERT, UPDATE and DELETE queries as the indexes need to be updated as well. Inserts are mostly done on a one-by-one basis so the performance penalties are often not noticeable. Unlike bulk DELETE queries (see below.)
The ineffiencies of queries are masked by the OS (MS Windows XP in my case). Any query that requires reading of the entire table record by record is very slow the first time it is encountered. However, the OS will cache the file and subsequent queries will be much faster even if the parameters change, even if the query itself changes: any ineffient query that requires reading of the entire table file will appear quick once it is cached by the OS. This makes tuning your queries difficult as you need to reboot your OS between tests. After trolling the net I have not yet come across a practical way to flush the read cache in Windows.
A related inefficient query is: DELETE FROM myTable WHERE field1 = value. This will be slow if there are many records (30sec for 10,000 records in my case) to delete (using MySQL, storage engine MYISAM in all of these examples) even if there is an index on field1. This is because the DBMS needs to read (and possibly write) the entire table file from (to) disk (depending on how the DELETE is carried out physically). Again, this will be masked by the OS: It is slow the first time, but quite fast (30 times faster on my XP machine) on subsequent queries, because the OS does some magic, which is difficult (impossible?) to turn off. The addition of indexes as suggested above may add to the time a delete query will take.

Any thoughts or comments are appreciated!

EF Query containing "Distinct()" slow first time

Update:

1 Answers1