1

I have freshly installed an application on solaris 5.10 . When checked through ps -ef | grep hyperic | grep agent, process are up and running . When checked the status through svcs hyperic-agent command, the output shows that the agent is in maintenance mode . Application is working fine and I dont have any issues with the application . Please help

myaut
  • 11,174
  • 2
  • 30
  • 62
manu endla
  • 301
  • 4
  • 19
  • Maybe your starter doesn't exit with zero (`SMF_EXIT_OK`) status? Please check logs for service (their location is available from `svcs -x hyperic-agent` component). – myaut Apr 15 '15 at 10:18
  • Thanks a million for your reply !!!!!!! Please find the log snippet.. Oracle Corporation SunOS 5.10 Generic Patch January 2005 -n Starting HQ Agent... -n . -n . -n . running (3314). Oracle Corporation SunOS 5.10 Generic Patch January 2005 3671 [ Apr 14 10:18:01 Method "start" exited with status 0 ] ... ..Start method exited with zero . – manu endla Apr 15 '15 at 12:01
  • Well, there are many ways SMF monitors application: forking and exiting processes, delivered signals, probably one of that bad event had been noticed and SMF marked it as maintenance. These events should be masked in app manifest. – myaut Apr 15 '15 at 12:06
  • The actual system that provides SMF such facilities is System Contracts. You may try to `clear` your application status, carefully restart it, and issue `svcs -v hyperic-agent` to get CTID (contract-id) of your service than run `ctwatch CTID` to track that events (if service isn't already marked as maintenance) – myaut Apr 15 '15 at 12:15
  • I greatly appreciate the response . After a clear and careful restart, hurrah !! I got the CTID .. Please find the o/p .. But after some time the status is again going into maintenance state .. root@rhmwsoss:/opt/hyperic-agent/agent-4.6.6.1-EE/bundles/agent-4.6.6.1/bin# ctwatch 37211 CTID EVID CRIT ACK CTTYPE SUMMARY 37211 28052 crit no process contract empty – manu endla Apr 15 '15 at 13:17

1 Answers1

3

There are several reasons that lead to that behavior:

  • Starter (start/exec property of service) returned status that is different from SMF_EXIT_OK (zero). Than you may check logs:

     # svcs -x ssh
     ...
     See: /var/svc/log/network-ssh:default.log
    

    If you check logs, you may see following messages that means, starter script failed or incorrectly written:

     [ Aug 11 18:40:30 Method "start" exited with status 96 ]
    
  • Another reason for such behavior is that service faults during while its working (i.e. one of processes coredumps or receives kill signal or all processes exits) as described here: https://blogs.oracle.com/lianep/entry/smf_5_fault_retry_models

    The actual system that provides SMF facilities for monitoring that is System Contracts. You may determine contract ID of online service with svcs -v (field CTID):

    # svcs -vp svc:/network/smtp:sendmail
    STATE          NSTATE        STIME    CTID   FMRI
    online         -             Apr_14       68 svc:/network/smtp:sendmail
                Apr_14       1679 sendmail
                Apr_14       1681 sendmail
    

    Than watch events with ctwatch:

    # ctwatch 68
    CTID    EVID    CRIT ACK CTTYPE   SUMMARY
    68      28      crit no  process  contract empty
    

    Than there are two options to handle that:

    • There is a real problem with service so it eventually faults. Than debug the application.

    • It is normal behavior of service, so you should edit and re-import your service manifest, to make SMF less paranoid. I.e. configure ignore_error and duration properties.

myaut
  • 11,174
  • 2
  • 30
  • 62
  • 1
    Thanks a lot for the info !!! it really helped me .. For my case the start method is exiting with zero but I see some error statements in the logs , that I am trying to fix it up . Anyway thanks again !!!!! – manu endla Apr 16 '15 at 09:07