How to troubleshoot a DSX scheduled notebook?

Question

I have a DSX notebook that I can run manually usng the DSX user interface and it populates some data in a Cloudant database.

I have scheduled the notebook to run hourly. Overnight I would have expected the job to have run many times, but the Cloudant database has not been updated.

How can I debug the scheduled job? Are there any logs that I can check to verify that the notebook has actually been executed? Is the output from my notebook saved to log files? Where can I find these files?

score 2 · Accepted Answer · answered Jan 05 '17 at 11:59

2

One possibility is to look into the kernel logs of your notebook kernel. For that you need to use a Python notebook.

Check the following location on the gpfs in your Python notebook:

!ls /gpfs/fs01/user/USERID/logs/notebook/

To get the USERID execute the following code:

!whoami

You should find for each kernel a log file, e.g. kernel-python3-20170105_102510.

answered Jan 05 '17 at 11:59

Sven Hafeneger

801
6
13

Yes, that is also possible. – Sven Hafeneger Jan 05 '17 at 13:46

Chris Snow · Answer 2 · 2017-01-06T02:15:15.237

One gotcha is print() statements will not make it to the log file, so you need to use the spark logging functionality. In pyspark, I created a utility function that sends the output to a log file, but also prints it to the notebook when the notebook is run manually:

# utility method for logging
log4jLogger = sc._jvm.org.apache.log4j

# give a meaningful name to your logger (mine is CloudantRecommender)
LOGGER = log4jLogger.LogManager.getLogger("CloudantRecommender")

def info(*args):
    print(args) # sends output to notebook
    LOGGER.info(args) # sends output to kernel log file

def error(*args): 
    print(args) # sends output to notebook
    LOGGER.error(args) # sends output to kernel log file

Using the function like so in my notebook:

info("some log output")

If I check the log files I can see my logout is getting written:

! grep 'CloudantRecommender' $HOME/logs/notebook/*pyspark* 

kernel-pyspark-20170105_164844.log:17/01/05 10:49:08 INFO CloudantRecommender: [Starting load from Cloudant: , 2017-01-05 10:49:08]
kernel-pyspark-20170105_164844.log:17/01/05 10:53:21 INFO CloudantRecommender: [Finished load from Cloudant: , 2017-01-05 10:53:21]

Exceptions don't appear to get sent to the log file either, so you will need to wrap code in a try block and log the error, e.g.

import traceback
try:
    # your spark code that may throw an exception
except Exception as e:
    # send the exception to the spark logger
    error(str(e), traceback.format_exc(), ts())
    raise e

CAUTION: Another gotcha that hit me during debugging is that scheduled jobs run a specific version of a notebook. Check that you update the schedule job when you save new versions of your notebook.

score 1 · Answer 3 · answered Jan 05 '17 at 12:03

You can find the notebook execution logs at the location ~/notebook/logs/. If your notebook was executed multiple times over the night you will find several log entries (ls -la ~/notebook/logs/). Open up one of the relevant log files (according to the timestamp) to see if the notebooks had connection issues with Cloudant or something else (cat ~/notebook/logs/kernel-pyspark-20170105_095103.log).

How to troubleshoot a DSX scheduled notebook?

3 Answers3

Linked