12

The Scenario

There are certain SQL Agent Jobs that are scheduled to run every few minutes throughout the day.

There are legitimate times when it will miss its next schedule because it's still running from the previous schedule.

Every once and a while, a job might 'hang'. This doesn't produce a failure (since the job hasn't stopped yet). When this happens, the job can be manually stopped and works fine the next time it runs. It's designed to pick back up where it left off.

What's the most efficient way...?

I'd like a way to determine how long (in seconds) a SQL Agent Job named 'JobX' is currently running. If it isn't currently running, we can just return zero.

This way, I can stop the job if it has been running for an amount of time beyond a certain threshold.

I assume that a combination of xp_sqlagent_enum_jobs and sysjobhistory could be used, but I'm curious if there are better solutions out there... and can hopefully benefit from the obstacles the rest of you have already run into and worked around.

Kevin Fairchild
  • 10,891
  • 6
  • 33
  • 52
  • 1
    Check out the following article. We had a similar issue, and implemented this solution: [SQL “Watchdog” loop to start and monitor SQL Agent Jobs](http://cc.davelozinski.com/code/sql-watchdog-loop-start-monitor-sql-agent-jobs) –  Feb 23 '17 at 07:04

3 Answers3

22

This solution would work:

SELECT DATEDIFF(SECOND,aj.start_execution_date,GetDate()) AS Seconds
FROM msdb..sysjobactivity aj
JOIN msdb..sysjobs sj on sj.job_id = aj.job_id
WHERE aj.stop_execution_date IS NULL -- job hasn't stopped running
AND aj.start_execution_date IS NOT NULL -- job is currently running
AND sj.name = 'JobX'
and not exists( -- make sure this is the most recent run
    select 1
    from msdb..sysjobactivity new
    where new.job_id = aj.job_id
    and new.start_execution_date > aj.start_execution_date
)

This a more general check dependent on system tables. If you'd prefer a custom route, you could have the job insert into a job log table you created instead.

Zhenny
  • 811
  • 6
  • 9
  • Wouldn't you want something like this added on, to further limit it just to the most recent activity? "inner join (SELECT MAX(start_execution_date) AS StartDate, job_id FROM msdb..sysjobactivity GROUP BY job_id) MostRecentActivity on sj.job_id = MostRecentActivity.job_id AND aj.start_execution_date=MostRecentActivity.StartDate" – Kevin Fairchild May 23 '12 at 19:56
  • Not necessarily. If you are only looking for a job that is currently running and running longer than you expect, this query should only return to you what is currently running. If it is not running, it returns nothing. – Zhenny May 23 '12 at 20:01
  • In practice, the results given on a job that is currently running actually gives a bunch of records back. Limiting it with that JOIN seems to give the correct values. I modified your answer and accepted. – Kevin Fairchild May 23 '12 at 20:07
  • Is that due to the issue you mentioned before where the job hangs and you have to manually stop it? I'm curious as to what the extra returned entries are. If there are extra returned entries, the query also runs into the issue of returning the difference between when the last job hung even if the job is not running. I'm going to modify the query a little as the join can be replaced with a top 1. – Zhenny May 23 '12 at 20:15
  • Zhenny, using the new code, I get results back like "11" or "12" while it is running and then results like "22916011" for the rest of the time. As soon as the job starts again, it goes back to single and double-digits. When the JOIN was there, I was not getting these results. I'd either see the number of seconds while it was running or no result. – Kevin Fairchild May 23 '12 at 20:24
  • I checked the sysjobactivity table it looks like there are old records that meet your criteria. So while the job isn't running, it's finding old activity. In this case, there's an entry with a start_execution_date of "2011-09-01 10:47:20.000" and no stop_execution_date. So when I was JOINing based on the MAX start date, it would pull back the match for either the currently-executing job or the most recently-executed job (which wouldn't meet the criteria in the WHERE, which prevents incorrect data being displayed) – Kevin Fairchild May 23 '12 at 20:27
  • I'm sorry. I missed the fact that any new runs could also be the MAX(start_execution_date). I edited the query again. – Zhenny May 23 '12 at 20:30
  • Perfect. Works like a champ now. – Kevin Fairchild May 23 '12 at 20:32
2
/**** FOR CURRENTLY RUNNING JOBS ****/
SELECT j.name AS Job_Name,DATEDIFF(ss,a.start_execution_date ,GETDATE ())   
FROM msdb.dbo.sysjobactivity a INNER JOIN msdb.dbo.sysjobs j 
ON a.job_id =j.job_id
WHERE CONVERT(DATE,a.start_execution_date )=CONVERT(DATE,GETDATE ())
AND a.stop_execution_date IS NULL


/*This code will give u the Name of currently running jobs and for how much time it is running & after that u can add filters to it as u wish*/
/*Thanks in advance*/
Bikash
  • 21
  • 2
0

What you're doing manually sounds like what's known as a "watch dog loop". Basically, an SQL job that starts and/or monitors agent jobs, killing them if need be.

The code below was taken from here, and should help, removing the need for you to manually monitor and kill jobs if they've been running for a lengthy period of time:

/****************************************************************
--This SQL will take a list of SQL Agent jobs (names must match),
--start them so they're all running together, and then
--monitor them, not quitting until all jobs have completed.
--
--In essence, it's an SQL "watchdog" loop to start and monitor SQL Agent Jobs
--
--Code from http://cc.davelozinski.com/code/sql-watchdog-loop-start-monitor-sql-agent-jobs
--
****************************************************************/
SET NOCOUNT ON 

-------- BEGIN ITEMS THAT NEED TO BE CONFIGURED --------

--The amount of time to wait before checking again 
--to see if the jobs are still running.
--Should be in hh:mm:ss format. 
DECLARE @WaitDelay VARCHAR(8) = '00:00:20'

--Job timeout. Eg, if the jobs are running longer than this, kill them.
DECLARE @TimeoutMinutes INT = 240

DECLARE @JobsToRunTable TABLE
(
    JobName NVARCHAR(128) NOT NULL,
    JobID UNIQUEIDENTIFIER NULL,
    Running INT NULL
)

--Insert the names of the SQL jobs here. Last two values should always be NULL at this point.
--Names need to match exactly, so best to copy/paste from the SQL Server Agent job name.
INSERT INTO @JobsToRunTable (JobName, JobID, Running) VALUES ('NameOfFirstSQLAgentJobToRun',NULL,NULL)
INSERT INTO @JobsToRunTable (JobName, JobID, Running) VALUES ('NameOfSecondSQLAgentJobToRun',NULL,NULL)
INSERT INTO @JobsToRunTable (JobName, JobID, Running) VALUES ('NameOfXSQLAgentJobToRun',NULL,NULL)

-------- NOTHING FROM HERE DOWN SHOULD NEED TO BE CONFIGURED --------

DECLARE @ExecutionStatusTable TABLE
(
    JobID UNIQUEIDENTIFIER PRIMARY KEY, -- Job ID which will be a guid
    LastRunDate INT, LastRunTime INT, -- Last run date and time
    NextRunDate INT, NextRunTime INT, -- Next run date and time
    NextRunScheduleID INT, -- an internal schedule id
    RequestedToRun INT, RequestSource INT, RequestSourceID VARCHAR(128),
    Running INT,    -- 0 or 1, 1 means the job is executing
    CurrentStep INT, -- which step is running
    CurrentRetryAttempt INT, -- retry attempt
    JobState INT -- 0 = Not idle or suspended, 1 = Executing, 2 = Waiting For Thread,
                     -- 3 = Between Retries, 4 = Idle, 5 = Suspended, 
                     -- 6 = WaitingForStepToFinish, 7 = PerformingCompletionActions
)

DECLARE @JobNameToRun NVARCHAR(128) = NULL
DECLARE @IsJobRunning BIT = 1
DECLARE @AreJobsRunning BIT = 1
DECLARE @job_owner sysname = SUSER_SNAME()
DECLARE @JobID UNIQUEIDENTIFIER = null
DECLARE @StartDateTime DATETIME = GETDATE()
DECLARE @CurrentDateTime DATETIME = null
DECLARE @ExecutionStatus INT = 0
DECLARE @MaxTimeExceeded BIT = 0

--Loop through and start every job
DECLARE dbCursor CURSOR FOR SELECT JobName FROM @JobsToRunTable
OPEN dbCursor FETCH NEXT FROM dbCursor INTO @JobNameToRun
WHILE @@FETCH_STATUS = 0
BEGIN
    EXEC [msdb].[dbo].sp_start_job @JobNameToRun
    FETCH NEXT FROM dbCursor INTO @JobNameToRun
END
CLOSE dbCursor
DEALLOCATE dbCursor

print '*****************************************************************'
print 'Jobs started. ' + CAST(@StartDateTime as varchar)
print '*****************************************************************'

--Debug (if needed)
--SELECT * FROM @JobsToRunTable

WHILE 1=1 AND @AreJobsRunning = 1
BEGIN

    --This has to be first with the delay to make sure the jobs
    --have time to actually start up and are recognized as 'running'
    WAITFOR DELAY @WaitDelay 

    --Reset for each loop iteration
    SET @AreJobsRunning = 0

    --Get the currently executing jobs by our user name
    INSERT INTO @ExecutionStatusTable
    EXECUTE [master].[dbo].xp_sqlagent_enum_jobs 1, @job_owner

    --Debug (if needed)
    --SELECT 'ExecutionStatusTable', * FROM @ExecutionStatusTable

    --select every job to see if it's running
    DECLARE dbCursor CURSOR FOR 
        SELECT x.[Running], x.[JobID], sj.name 
        FROM @ExecutionStatusTable x 
        INNER JOIN [msdb].[dbo].sysjobs sj ON sj.job_id = x.JobID
        INNER JOIN @JobsToRunTable jtr on sj.name = jtr.JobName
    OPEN dbCursor FETCH NEXT FROM dbCursor INTO @IsJobRunning, @JobID, @JobNameToRun

    --Debug (if needed)
    --SELECT x.[Running], x.[JobID], sj.name 
    --  FROM @ExecutionStatusTable x 
    --  INNER JOIN msdb.dbo.sysjobs sj ON sj.job_id = x.JobID
    --  INNER JOIN @JobsToRunTable jtr on sj.name = jtr.JobName

    WHILE @@FETCH_STATUS = 0
    BEGIN
        --bitwise operation to see if the loop should continue
        SET @AreJobsRunning = @AreJobsRunning | @IsJobRunning

        UPDATE @JobsToRunTable
        SET Running = @IsJobRunning, JobID = @JobID
        WHERE JobName = @JobNameToRun

        --Debug (if needed)
        --SELECT 'JobsToRun', * FROM @JobsToRunTable

        SET @CurrentDateTime=GETDATE()

        IF @IsJobRunning = 1
        BEGIN -- Job is running or finishing (not idle)

            IF DATEDIFF(mi, @StartDateTime, @CurrentDateTime) > @TimeoutMinutes
            BEGIN     
                print '*****************************************************************'
                print @JobNameToRun + ' exceeded timeout limit of ' + @TimeoutMinutes + ' minutes. Stopping.'
                --Stop the job
                EXEC [msdb].[dbo].sp_stop_job @job_name = @JobNameToRun
            END
            ELSE
            BEGIN
                print @JobNameToRun + ' running for ' + CONVERT(VARCHAR(25),DATEDIFF(mi, @StartDateTime, @CurrentDateTime)) + ' minute(s).'
            END
        END

        IF @IsJobRunning = 0 
        BEGIN
            --Job isn't running
            print '*****************************************************************'
            print @JobNameToRun + ' completed or did not run. ' + CAST(@CurrentDateTime as VARCHAR)
        END

        FETCH NEXT FROM dbCursor INTO @IsJobRunning, @JobID, @JobNameToRun

    END -- WHILE @@FETCH_STATUS = 0
    CLOSE dbCursor
    DEALLOCATE dbCursor

    --Clear out the table for the next loop iteration
    DELETE FROM @ExecutionStatusTable

    print '*****************************************************************'

END -- WHILE 1=1 AND @AreJobsRunning = 1

SET @CurrentDateTime = GETDATE()
print 'Finished at ' + CAST(@CurrentDateTime as varchar)
print CONVERT(VARCHAR(25),DATEDIFF(mi, @StartDateTime, @CurrentDateTime)) + ' minutes total run time.'
Free Coder 24
  • 933
  • 9
  • 12