3

We're using a job scheduling system that runs on top of DBMS_JOB. It uses a master job to create one-time jobs. We deploy the same set of jobs to all our clients, but can specify which jobs should only run at certain clients.

We get occasional problems with a process run by a job hanging. The main cause of this is UTL_TCP not timing out when it does get an expected response. I want to be able to kill those jobs so that they can run again.

I'm looking at creating a new job that kill any of these one-time jobs that have been running for longer than a certain time.

We're stuck with Oracle 10g for a while yet, so I'm limited to what that can do.

There's an article that seems to cover most of this at

http://it.toolbox.com/blogs/database-solutions/killing-the-oracle-dbms_job-6498

I have a feeling that this is not going to cover all eventualities, including:

  1. We run can jobs as several different users and a user can only break/remove jobs they created. I believe that I may be able to use DBMS_IJOB to get around that, but I need to get the DBA to let me execute it.
  2. We have Oracle RAC systems. I understand 10g limits ALTER SYSTEM KILL SESSION to killing sessions on the current instance. I could arrange for all jobs to run on the same instance, but I've not tried that yet.

Anything else I should consider? Stack Overflow needs a definitive answer on this.

steevc
  • 662
  • 2
  • 10
  • 19
  • 3
    Your statement "but I need to get the DBA to let me execute it" makes it sound like you haven't run the problem to ground with your DBA? This is an issue strongly in the DBA category, which I (speaking as a DBA) would advise you deal with directly with your DBA before working too hard on a programmatic solution. – Rob Oct 06 '09 at 14:45
  • 1
    I've upvoted Rob's comment, and add that I would investigate setting resource limits to deal with this situation. In my experience, and as the linked article mentions, you'll sometimes need host OS access and permissions to kill the ora_qnnn processes that run the jobs to effectively kill the job so you may not be able to entirely stop things with a database program anyway. – dpbradley Oct 06 '09 at 15:28
  • +1 @rob. In my opinion, this is a constructive answer to the question +1 @dpbradley for mentioning the possibility of OS side activity – Juergen Hartelt Oct 06 '09 at 20:08
  • I'm getting help from our DBAs on how I should do this, but it's something we want to run on client systems to automate killing run-away jobs so that our DBA doesn't have to go in there and fix it each time. It's a pretty rare occurrence anyway. – steevc Oct 07 '09 at 15:42

2 Answers2

1
  • You can get the PID from the job tables and kill the stuck process via the normal OS commands.

  • You can kill jobs on any instance. On 10g, you need to know on which instance the stuck job is running, and connect to that instance:

To get your instance and pid:

select inst_id, process from gv$session where ...

Connect to a specific instance:

sqplus admin@node3 as sysdba
alter system kill session ...
Mark Harrison
  • 297,451
  • 125
  • 333
  • 465
1

There are more ways to kill a session on oracle. Depends on your plattform. Running on unix sessions (background jobs too) are represented by processes. Killing the process, kills the session. On windows sessions are represented by a thread. Killing the thread using orakill, kills the session. The process (or thread) id is stored in gv$process.

Christian13467
  • 5,324
  • 31
  • 34
  • The Oracle documents on UTL_TCP say that timeout is not implemented on write operations. Still not there in 11g. If they fixed that then I might not need to kill jobs. – steevc Oct 07 '09 at 15:42