0

I have a rather large query that is running very very slow on a DB with very little data (the highest rowcount for the table this query accesses is 8K, and most of the filters are on IDs with EF navigation properties, the only parameter passed is one single guid).

It was running silky smooth but after a few change (sadly we changed a "lot" today to rework it into a single query while before it was over 30 in cshtml with lazy loading and is now all preloaded at the start of the file) it now runs at a snails pace (timeouts at 30sec and it takes 34). Running it in SQL server management studio provides the same runtime (34 sec), doing it multiple times doesn't speed it up, showing the execution plan, waiting a bit, then running it again and at some point it goes from 34 to 0 seconds.

We have ruled out the following:

  • Query plan execution caching (How can I clear the SQL Server query cache?) : time to execute doesn't change after flushing the plans, once it's fast it stays fast

  • Result isn't cached : changing the guid (which change the values of everything else) doesn't change things, once it's fast it stays fast

  • SQL Server Installation issue : when this happened on my coworker's PC i grabed the latest version and reproduced the same issue

  • Issues with the computer : nothing taking CPU time except SQL server, nothing else opened, both computers ran just fine and aren't related (not on the same network/ not on a domain etc), something that would effect them both negatively at the same instant can be ruled out.

We are at loss because it's really hard to debug "why" it's slow to begin with when it ends up executing instantaneously out of nowhere.

I'm not sure the query is that useful hence why i post it last, I renamed some tables (from T0 T1 etc) to make it easier to read but it's far from easy to digest. You'll find bellow both the C# code used to generate it and the SQL result that takes 34 seconds, again and again, until it takes 0!

LINQ (ef core latest, 3.1) :

Context.Orders
    .Where(o => o.UniqueId == new Guid("MY GUID HERE"))
    .SelectMany(o => o.Products)
    .Where(FranceMontgolfieres.Models.Mappings.ProductOrder.IsTicketExpression)
    .Select(po => new
    {
        po.UniqueId,
        ProductId = po.Product.Id,
        ProductName = po.Product.Name,
        AllPassengerAreRegistered = po.Passengers.All(pa => pa.Passenger != null),
        passengers = po.Passengers
                .Select(pa => new
                {
                    pa.UniqueId,
                    IsRegistered = pa.Passenger != null,
                    pa.Passenger.FirstName,
                    pa.Passenger.LastName,
                    FullName = pa.Passenger.FirstName + " " + pa.Passenger.LastName,
                    HasFlown = pa.Passenger.Manifests
                    .Any(m => m.History
                        .Any(h => h.Status == PassengerManifestStatus.Flown)),
                    HasPendingFlight = pa.Passenger.Manifests
                    .Any(m => m.History
                        .Any(h => h.Status == PassengerManifestStatus.Incoming)),
                    HasPendingReport = pa.Passenger.Manifests
                    .Any(m => m.History
                        .Any(h => h.Status == PassengerManifestStatus.Report)),
                    Manifests = pa.Passenger.Manifests
                        .Select(m => new
                        {
                            manifest = new
                            {
                                HasFlown = m.History
                                    .Any(h => h.Status == PassengerManifestStatus.Flown),
                                HasPendingFlight = m.History
                                    .Any(h => h.Status == PassengerManifestStatus.Incoming),
                                HasPendingReport = m.History
                                    .Any(h => h.Status == PassengerManifestStatus.Report),
                                m.Manifest.StartDate,
                                m.Manifest.FlightBase.Name,
                                m.Manifest.FlightBase.Location,
                                m.Manifest.State
                            },
                            history = m.History
                                .Select(h => new
                                {
                                    h.Date,
                                    h.Status
                                })
                        })
                }),
        PendingReservationExists = po.PendingReservation != null
    })
    .ToList();

What we run before the query (still takes 0 sec after) to make sure it's not being cached:

CHECKPOINT; 
GO 
DBCC DROPCLEANBUFFERS; 
GO
DBCC FREEPROCCACHE;
GO
DBCC FREESYSTEMCACHE("ALL");
GO
DBCC FREESESSIONCACHE;
GO
DBCC FLUSHAUTHCACHE;
GO

SQL Query :

SELECT [ProductOrders_p1].[UniqueId],
    [Products_p2].[Id], 
    [TranslatableString_t].[Id], 
    [TranslatableString_t].[MainPageCompanyDetailId], 
    [TranslatableString_t].[Text], 
    CASE
          WHEN NOT EXISTS (
              SELECT 1
              FROM [ProductOrderPassenger] AS [p]
              LEFT JOIN [Passengers] AS [p0] ON [p].[Id] = [p0].[ProductOrderPassengerId]
              WHERE ([ProductOrders_p1].[Id] = [p].[ProductOrderId]) AND [p0].[Id] IS NULL) THEN CAST(1 AS bit)
          ELSE CAST(0 AS bit)
      END,
    CASE
          WHEN [PendingReservations_p3].[Id] IS NOT NULL THEN CAST(1 AS bit)
          ELSE CAST(0 AS bit)
      END, 
    [o].[Id], 
    [ProductOrders_p1].[Id], 
    [t3].[UniqueId], 
    [t3].[c], 
    [t3].[FirstName], 
    [t3].[LastName], 
    [t3].[c0], 
    [t3].[c1], 
    [t3].[c2],
    [t3].[c3], 
    [t3].[Id], 
    [t3].[c4], 
    [t3].[c00], 
    [t3].[c10],
    [t3].[StartDate], 
    [t3].[Id0], 
    [t3].[MainPageCompanyDetailId], 
    [t3].[Text], 
    [t3].[Id00], 
    [t3].[MainPageCompanyDetailId0], 
    [t3].[Text0], 
    [t3].[State], 
    [t3].[ManifestId], 
    [t3].[PassengerId], 
    [t3].[Id1], 
    [t3].[Date],
    [t3].[Status], 
    [t3].[Id2]
      FROM [Orders] AS [o]
      INNER JOIN [ProductOrders] AS [ProductOrders_p1] ON [o].[Id] = [ProductOrders_p1].[OrderId]
      LEFT JOIN [Products] AS [Products_p2] ON [ProductOrders_p1].[ProductId] = [Products_p2].[Id]
      LEFT JOIN [TranslatableStrings] AS [TranslatableString_t] ON [Products_p2].[NameId] = [TranslatableString_t].[Id]
      LEFT JOIN [PendingReservations] AS [PendingReservations_p3] ON [ProductOrders_p1].[Id] = [PendingReservations_p3].[ProductOrderId]
      LEFT JOIN (
          SELECT [ProductOrderPassengers_p4].[UniqueId], CASE
              WHEN [Passengers_p5].[Id] IS NOT NULL THEN CAST(1 AS bit)
              ELSE CAST(0 AS bit)
          END AS [c], [Passengers_p5].[FirstName], [Passengers_p5].[LastName], ([Passengers_p5].[FirstName] + N' ') + [Passengers_p5].[LastName] AS [c0], CASE
              WHEN EXISTS (
                  SELECT 1
                  FROM [ManifestPassengers] AS [m]
                  WHERE ([Passengers_p5].[Id] IS NOT NULL AND ([Passengers_p5].[Id] = [m].[PassengerId])) AND EXISTS (
                      SELECT 1
                      FROM [ManifestPassengerHistories] AS [m0]
                      WHERE (([m].[ManifestId] = [m0].[ManifestPassengerManifestId]) AND ([m].[PassengerId] = [m0].[ManifestPassengerPassengerId])) AND ([m0].[Status] = 5))) THEN CAST(1 AS bit)
              ELSE CAST(0 AS bit)
          END AS [c1], CASE
              WHEN EXISTS (
                  SELECT 1
                  FROM [ManifestPassengers] AS [m1]
                  WHERE ([Passengers_p5].[Id] IS NOT NULL AND ([Passengers_p5].[Id] = [m1].[PassengerId])) AND EXISTS (
                      SELECT 1
                      FROM [ManifestPassengerHistories] AS [m2]
                      WHERE (([m1].[ManifestId] = [m2].[ManifestPassengerManifestId]) AND ([m1].[PassengerId] = [m2].[ManifestPassengerPassengerId])) AND ([m2].[Status] = 1))) THEN CAST(1 AS bit)
              ELSE CAST(0 AS bit)
          END AS [c2], CASE
              WHEN EXISTS (
                  SELECT 1
                  FROM [ManifestPassengers] AS [m3]
                  WHERE ([Passengers_p5].[Id] IS NOT NULL AND ([Passengers_p5].[Id] = [m3].[PassengerId])) AND EXISTS (
                      SELECT 1
                      FROM [ManifestPassengerHistories] AS [m4]
                      WHERE (([m3].[ManifestId] = [m4].[ManifestPassengerManifestId]) AND ([m3].[PassengerId] = [m4].[ManifestPassengerPassengerId])) AND ([m4].[Status] = 2))) THEN CAST(1 AS bit)
              ELSE CAST(0 AS bit)
          END AS [c3], [ProductOrderPassengers_p4].[Id], [t2].[c] AS [c4], [t2].[c0] AS [c00], [t2].[c1] AS [c10], [t2].[StartDate], [t2].[Id] AS [Id0], [t2].[MainPageCompanyDetailId], [t2].[Text], [t2].[Id0] AS [Id00], [t2].[MainPageCompanyDetailId0], [t2].[Text0], [t2].[State], [t2].[ManifestId], [t2].[PassengerId], [t2].[Id1], [t2].[Date], [t2].[Status], [t2].[Id2], [ProductOrderPassengers_p4].[ProductOrderId]
          FROM [ProductOrderPassenger] AS [ProductOrderPassengers_p4]
          LEFT JOIN [Passengers] AS [Passengers_p5] ON [ProductOrderPassengers_p4].[Id] = [Passengers_p5].[ProductOrderPassengerId]
          LEFT JOIN (
              SELECT CASE
                  WHEN EXISTS (
                      SELECT 1
                      FROM [ManifestPassengerHistories] AS [m5]
                      WHERE (([m8].[ManifestId] = [m5].[ManifestPassengerManifestId]) AND ([m8].[PassengerId] = [m5].[ManifestPassengerPassengerId])) AND ([m5].[Status] = 5)) THEN CAST(1 AS bit)
                  ELSE CAST(0 AS bit)
              END AS [c], CASE
                  WHEN EXISTS (
                      SELECT 1
                      FROM [ManifestPassengerHistories] AS [m6]
                      WHERE (([m8].[ManifestId] = [m6].[ManifestPassengerManifestId]) AND ([m8].[PassengerId] = [m6].[ManifestPassengerPassengerId])) AND ([m6].[Status] = 1)) THEN CAST(1 AS bit)
                  ELSE CAST(0 AS bit)
              END AS [c0], CASE
                  WHEN EXISTS (
                      SELECT 1
                      FROM [ManifestPassengerHistories] AS [m7]
                      WHERE (([m8].[ManifestId] = [m7].[ManifestPassengerManifestId]) AND ([m8].[PassengerId] = [m7].[ManifestPassengerPassengerId])) AND ([m7].[Status] = 2)) THEN CAST(1 AS bit)
                  ELSE CAST(0 AS bit)
              END AS [c1], [m9].[StartDate], [t0].[Id], [t0].[MainPageCompanyDetailId], [t0].[Text], [t1].[Id] AS [Id0], [t1].[MainPageCompanyDetailId] AS [MainPageCompanyDetailId0], [t1].[Text] AS [Text0], [m9].[State], [m8].[ManifestId], [m8].[PassengerId], [m9].[Id] AS [Id1], [m10].[Date], [m10].[Status], [m10].[Id] AS [Id2]
              FROM [ManifestPassengers] AS [m8]
              INNER JOIN [Manifests] AS [m9] ON [m8].[ManifestId] = [m9].[Id]
              LEFT JOIN [FlightBases] AS [f] ON [m9].[FlightBaseId] = [f].[Id]
              LEFT JOIN [TranslatableStrings] AS [t0] ON [f].[NameId] = [t0].[Id]
              LEFT JOIN [TranslatableStrings] AS [t1] ON [f].[LocationId] = [t1].[Id]
              LEFT JOIN [ManifestPassengerHistories] AS [m10] ON ([m8].[ManifestId] = [m10].[ManifestPassengerManifestId]) AND ([m8].[PassengerId] = [m10].[ManifestPassengerPassengerId])
          ) AS [t2] ON [Passengers_p5].[Id] = [t2].[PassengerId]
      ) AS [t3] ON [ProductOrders_p1].[Id] = [t3].[ProductOrderId]
      WHERE ([o].[UniqueId] = 'd26013b9-b9bc-4497-b351-544739851d3d') AND EXISTS (
          SELECT 1
          FROM [ProductProductComponents] AS [p6]
          INNER JOIN (
              SELECT [p7].[Id], [p7].[DescriptionId], [p7].[Discriminator], [p7].[DisplayDescriptionInProduct], [p7].[FunctionnalName], [p7].[HTPrice], [p7].[IsArchived], [p7].[IsChildrenPrice], [p7].[IsOptionnal], [p7].[IsSelectedInBasketByDefault], [p7].[IsShipping], [p7].[NameId], [p7].[ParentProductComponentId], [p7].[ShowInBasket], [p7].[ShowInInvoice], [p7].[VATRate], [p7].[AllowPartnerBase], [p7].[AllowWeekends], [p7].[ValidForTicks]
              FROM [ProductComponents] AS [p7]
              WHERE [p7].[Discriminator] IN (N'ProductComponent', N'FlightProductComponent')
          ) AS [t4] ON [p6].[ProductComponentId] = [t4].[Id]
          WHERE ([Products_p2].[Id] IS NOT NULL AND ([Products_p2].[Id] = [p6].[ProductId])) AND ([t4].[Discriminator] = N'FlightProductComponent'))
      ORDER BY [o].[Id], [ProductOrders_p1].[Id], [t3].[Id], [t3].[ManifestId], [t3].[PassengerId], [t3].[Id1], [t3].[Id2]
Ronan Thibaudau
  • 3,413
  • 3
  • 29
  • 78
  • 1
    Query Store is your friend in cases like these. It can record what queries are taking a long time and whether that's "the same" query with a query plan regression or something that looks similar but isn't. It also allows you to force execution plans if regression really is happening, but obviously, where possible it's better to look at more permanent fixes in the form of statistics maintenance/indexes/parameter sniffing switches/whatnot. Even just the automatic capturing of the execution plans for the fast/slow cases it does is useful, though. – Jeroen Mostert Jan 20 '20 at 15:46
  • @JeroenMostert There are no other queries running, just this one, nothing else is connected to the server it's just me running it on my local machine on my local sql server from management studio. Also there is no regression, quite the contrary it's a progression, it goes from 34 seconds to 0, but the expected time is close to 0 so i'm trying to figure out what is taking so long at first (the many first times) and then no longer is. With the commands i use to clear the cache it should never be considered the "same" query as the cache is empty – Ronan Thibaudau Jan 20 '20 at 15:56
  • Note that nothing in your "things we've ruled out" list actually demonstrates you're (not) getting the same execution plans. In particular, changing a constant value in the query may be subject to autoparameterization (i.e. still the same query), and getting a slow/fast query if you've cleared the cache merely proves that there is the potential for a slow/fast execution plan, not that the optimizer cannot later re-compile the thing into something faster/slower. Getting the same plan but having it run much slower is a much more interesting and uncommon case than different plans. – Jeroen Mostert Jan 20 '20 at 15:59
  • @JeroenMostert But (with the same parameter, never changing it), how could i be getting the same plan 10 times, and then a new plan out of the blue. Doesn't seem (especially if clearing the cache in between) that SQL server should make a different decision? – Ronan Thibaudau Jan 20 '20 at 16:00
  • @JeroenMostert Also updated what i use to clear everything between each query (not sure if it matters but it's the best, probably very redundant, i've found so far to cover everything) – Ronan Thibaudau Jan 20 '20 at 16:02
  • Indeed it should not make a different decision -- and nevertheless it may. But stop thinking and look! The actual execution plan(s)/`sys.dm_exec_cached_plans` will tell you what's what. If you have the exact same plan with different running times, then it's time to look at things like memory grants/locks/parallelism issues. If they're different plans, then you actually need to look at why the plan isn't always the same. Very different tracks of troubleshooting. – Jeroen Mostert Jan 20 '20 at 16:04
  • @JeroenMostert How could it be memory grants/locks/parallelism issue with a single user signed onto the DB running a single query? (is that possible or can i rule that out given nothing else uses sql server)? Also sys.dm_exec_cached_plans contains nothing after running the cache clearing query (except 2 lines which are i assume that very query?). How could i use it to debug my issue? All i get is a size type memory address and plan handle but not a view of a plan like what the plan analizer uses, am i missing something? – Ronan Thibaudau Jan 20 '20 at 16:08
  • Inspecting `sys.dm_exec_cached_plans` takes a bit more work than just listing the view; e.g. `select t.[text], p.query_plan from sys.dm_exec_cached_plans cp cross apply sys.dm_exec_query_plan(cp.plan_handle) p cross apply sys.dm_exec_sql_text(cp.plan_handle) t`. And yes, obviously, the plan cache will be empty directly after clearing it; you're looking at getting the plans both for the slow and fast instance of your query for comparison (from Management Studio, using the "Include Actual Execution Plan" option is simpler, but then you have to remember to save). – Jeroen Mostert Jan 20 '20 at 16:15
  • @JeroenMostert I'll try to repro the slow case (so far deleting the database/recreating it seemed to do the job), but if i find 2 different plans, and the second is massively faster while the first is unacceptably slow, do i have anyway to save that info and tell SQL server "here look, here's how you do it 1000X faster" ? Or am i just stuck? – Ronan Thibaudau Jan 20 '20 at 16:17
  • If you find different plans, you can look further into 1) why different plans were chosen (if this seems at all derived from, say, different estimates in number of rows in the input), 2) applying query hints to make it more likely that the "good" plan is chosen (including various `QUERYTRACEON` options but also `KEEPFIXED PLAN`), 3) using Query Store to force the "right" execution plan (doing this without QS is possible but much more involved). You want to avoid forcing plans as much as possible because this generally isn't a stable approach, but sometimes it's the only option. – Jeroen Mostert Jan 20 '20 at 16:22
  • @JeroenMostert Ok this keeps getting weirder, if i select show actual query plan when running it for the first time, it takes 2 seconds instead of 34 (but i can't do that in production through EF), but then it has the oposite effect, the second runtime (still with show actual plan) takes 18 seconds. So if i ask sql server to show me the actual plan (which if i understand correctly is what it would have generated anyway?) it takes 2 seconds instead of 34, and then moves to a slower plan. I think i may need to bring a DBA on this because it's out my league – Ronan Thibaudau Jan 20 '20 at 16:31
  • You may wish to double-check you don't have some funky network issues going on where an unfortunate confluence of data can cause packet fragmentation/resends. This can be verified with a packet sniffer like Wireshark. Try `SET STATISTICS IO, TIME ON` before the query in any case, to get SQL Server to spit out stats on the time *it* spent on the query. – Jeroen Mostert Jan 20 '20 at 16:35
  • @JeroenMostert No network issues possible, it’s a local install and communicating through shared memory not tcp/ip – Ronan Thibaudau Jan 20 '20 at 16:49

1 Answers1

0

Would it be possible to consider placing the query into a stored procedure? It should help keep the same plan as you insert in new values for a parameter. Alternatively you could attempt to refresh statistics and check the fragmentation on the index over UniqueID. That could be resulting in a bad plan by getting bad estimates.

If I had to take a stab at a guess the main index spots are Order, Productorder, Passengers and with the left join between Passengers and ProductOrderPassengers something weird goes on if the order of tables pulled flips. A trick you could try if you must remain ad hoc is something like

Declare @Oid varchar(10) Set @Oid = Select ID from Order where UniqueID = 'd26013b9-b9bc-4497-b351-544739851d3d'

then add a where clause for that ID if there is 1 row to help the estimate.

If its like a 1 uniqueID to many ID you could pull them to a temp table and then join it. I know those kinds of things helped tame our bad estimates but usually we try to only use simpler queries for ad hoc, complex go in the stored procs.

I don't think flushing your cache is doing you any favors.