On SSAS 2017 Tabular, I have SQL Agent jobs that run some t-sql steps and ssis packages and finish with an SSAS refresh. I have hundreds of these across three servers. (This is a system I have inherited so it was not my design to have two SQL Server instances and two SSAS instances on each of the three Windows VMs!)
Roughly once a week or so I get a "long running job" alert and it is always the final SSAS step on which it is stuck. It is a different job each time. The problem has been happening for a month or so and started around the time we implemented CU22.
Looking in $SYSTEM.DISCOVER_CONNECTIONS etc I see the Session spid and Connection Id, so can cancel using this type of thing:
<Cancel xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<ConnectionID>192507</ConnectionID>
<SessionID>40C4F30C-CD9C-4F61-A664-5F97B1AE1E7C</SessionID>
<SPID>983599</SPID>
<CancelAssociated>1</CancelAssociated>
</Cancel>
This clears the connection and sets the Session Status from 1 to 3, which I believe means closed. If I try to execute a refresh, either manually through SSMS or by re-running the final job step, it just sits there doing nothing. I presume it is still being blocked by the original hung process.
$SYSTEM.DISCOVER_SESSIONS shows 0 Session_Idle_Time, 93 millisecs of Session_CPU_Time and 92807506 Session_Elapsed_Time. So, if it wasn't idle and isn't using a lot of cpu, what is it waiting for?
The only resolution at the moment is restarting SSAS, which is far from ideal.
Obviously I can run Profiler but I'd have to leave it running continually for a week or more to catch it.
I'm a DBA of many years but have never worked with SSAS before. What can I look at to find out what is causing the problem in SSAS?