0

I have a bit of an odd occurrence that I'm trying to troubleshoot and wondering if any one has experienced similar issues and/or has any suggestions.

I have a query (not the best query, not the worst query) that has a number of similar unions (basically taking different data out of the same table and union all together) that then runs across multiple CPUs (the server has one socket 12 CPUs; the SQL server has Max Degree of Parallelism of 4 and a Cost Threshold of Parallelism of 55)

The expected behavior: for this query would then be that it runs across four CPUs; that it would max out those four CPUs (i.e. at 100%); that it would generate a number of CXPACKET waits; and that it would finish; also that other queries coming in would then untilise the other CPUs and run in the same fashion as if this query wasn't running (i.e. the other CPUs would be at 10% running other queries)

The actual behavior: seems to be that when this query is run four CPUs go to 100%; that there are a number of CXPACKET waits; that the other CPUs are showing utilisation (5-8%); that this query finishes and returns results; however other queries run seem to timeout / not complete / etc... (generally fail)

Any ideas what could be happening there? I feel like there is some SQL Server oddness that I'm not aware of? That the other queries are not being handled properly (maybe trying to hit the same CPUs for some reason?) That maybe there is some setting or control that can be put in place that would resolve this?

ChrisHDog
  • 4,473
  • 8
  • 51
  • 77
  • 1
    Check physical I/O and check memory grants -- if the optimizer estimates needing a ton of memory up front, other queries can get blocked even if the actual execution never reaches those amounts. In general, take a look at execution plans and wait stats for the queries that "seem to timeout/not complete". Although there are corner cases where allocating workers to CPUs goes pathological, those are far less common, and usually seen only when you have way more than 12 cores. – Jeroen Mostert Apr 08 '19 at 07:42
  • Physical I/O seems to be OK (checked in Activity Monitor and seems like the graph is at the bottom (not spiking etc...) ... can you point me in the right direction on how to check memory grants (that sounds like it might be impacting; just not sure the best way to query / view those?) – ChrisHDog Apr 09 '19 at 10:29
  • 1
    Step one is getting wait stats and execution plans -- those include memory grant details. As per my favorite sleuth, even though memory grants sound like a tempting cause, it is a capital mistake to theorize before one has data. See (e.g.) [this question](https://stackoverflow.com/q/7359702/4137916), [this](https://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/) for more on wait stats and the ever popular [first responder kit](https://www.brentozar.com/responder/) for a more thorough approach. – Jeroen Mostert Apr 09 '19 at 10:57

0 Answers0