How to write a z/OS Health Check?

Question

I would like to write a health check for z/OS but am unclear on where to begin. Any advice, examples, or direction available?

Also, is it possible to write a Health Check in Unix System Services?

Have you reviewed the [documentation](https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/com.ibm.zos.v2r3.e0zl100/chap1.htm)? — cschneid, Mar 08 '18 at 19:07
@cschneid thanks for the link! I am making my way through the documentation. I am looking for perhaps another person who has been down this road before. — Jade Steffen, Mar 08 '18 at 19:37
I'm afraid that person is not me. Sounds cool though, good luck with it. If you don't get answers here, you might try the ibm-main listserv. — cschneid, Mar 08 '18 at 23:35
I have. What are you looking to do? Are you familiar with REXX? Have you looked at the samples? — Kevin McKenzie, Mar 10 '18 at 09:13
@JadeSteffen Are you trying to write a health check from scratch, because the existing ones using tools from IBM, CA, BMC etc are not good enough, or are you trying to devise a process to health-check a z/OS system, using any tools available. If the latter, then is there a specific part of z/OS you are interested in checking (you mention USS)? — Steve Ives, Mar 15 '18 at 08:53

Steve Ives · Answer 1 · 2018-03-09T12:57:38.197

Write a simple REXX exec as follows:

/* REXX Health checker */

say 'Health check passed. System working.'

and then execute the EXEC. If you can't execute it or the if the message doesn't get printed, then the system is not working.

But seriously - exactly which part of z/OS are you health-checking? You want to know if the whole system is down or just parts of it? Which parts - CICS, MQ, DB2, IMS etc. Are batch jobs queuing? Are CICS transactions running too slowly? Are your MQ queue depths too large/small?

This is not a yes/no question. There are literally (and I mean literally in its literal sense) 1,000s of metrics and performance figures you can validate on a z/OS system - it's not a toy which is either up or down.

If you read this: IBM Health checker for z/Os Users' Guide you'll get some idea of what's involved.

Lots of people have been down this route before. Look up information on CA-Sysview, BMC Mainview, IBM's Omegamon - these are all very mature system monitors.

I suspect that you are looking at the mainframe as a remote system, and you want to know if it is 'up' i.e. if it will respond to whatever request you are making of it. Can you explain what you want it to do for you and then we might be able to devise a health check for your purpose.

score 0 · Answer 2 · answered Mar 13 '18 at 14:28

0

So, first, start with IBM Health Check for z/OS User's Guide. It will point you to some samples in SYS1.SAMPLIB you can use as a base. Specifically, start here.

As to your question about writing a Health Check under Unix System Services, it depends on what you want to do. Purely under USS, no. In order to have a Health Check, you need to register with the Health Check address space, and report status to it in a specific way. So the Health Check needs to be written in Metal C, assembler, or System REXX. (I'd recommend System REXX unless you're fluent in Metal C or assembler.) And you'll need a z/OS system programmer to install the check to a system library.

However, as documented in the System REXX reference, you can invoke USS services from with System REXX, and I believe Metal C and assembler as well, so depending on what you're trying to do, you may be able to write the Health Check you're trying to write.

answered Mar 13 '18 at 14:28

Kevin McKenzie

627
3
18

1

A word of caution about health checks using UNIX Services is that since the Health Check address space is essentially shared, it can be difficult to control the USS UID and so forth. Plus, health checks generally run APF authorized with sup state, which doesn't always mix well with USS. For this reason, we usually recommend that USS-intense health checks be implemented as "remote" checks in an address space under your control. – Valerie R Mar 13 '18 at 19:30
This isn’t writing a health check though, is it? It’s using a health check someone else (IBM) has written. The question was how to write a health check, not how to use one. – Steve Ives Mar 14 '18 at 18:47
No, it's pointers to documentation on how to write a health check, along with pointers to sample health checks that can be used as a base for future ones, as the way information is passed to and from HZSPROC is not straightforward. When I write health checks, the code in SYS1.SAMPLIB is what I use as a starting point. And, writing the health check is pointless if you can't get it installed, which is where the z/OS System Programmer comes into the picture. So it's specifically answering the question, which asked for examples and direction, provided by the samples and the documentation. – Kevin McKenzie Mar 15 '18 at 01:34

score 0 · Answer 3 · answered Mar 14 '18 at 14:39

0

Jade, I saw your question and I found this publication online that outlines using REXX to run a Health Check. It's not USS but I hope it may be useful to you.

http://ibmsystemsmag.com/mainframe/administrator/systemsmanagement/health_check_rexx/

answered Mar 14 '18 at 14:39

PleaseHelpTheNewGuy

167
3
4
13

How to write a z/OS Health Check?

3 Answers3