3

I would like more information when debugging my spark notebook. I have found some log files:

!ls $HOME/notebook/logs/

The files are:

bootstrap-nnnnnnnn_nnnnnn.log
jupyter-nnnnnnnn_nnnnnn.log   
kernel-pyspark-nnnnnnnn_nnnnnn.log
kernel-scala-nnnnnnnn_nnnnnn.log
logs-nnnnnnnn.tgz
monitor-nnnnnnnn_nnnnnn.log
spark160master-ego.log

Which applications log to these files and what information is written to each of these files?

Sumit Goyal
  • 575
  • 3
  • 16
Chris Snow
  • 23,813
  • 35
  • 144
  • 309

2 Answers2

3

When debugging notebooks, the kernel-*-*.log files are the ones you're looking for.

In logical order...

  1. bootstrap-*.log is written when the service starts. One file for each start, the timestamp indicates when that happened. Contains output from the startup script which initializes the user environment, creates kernel specs, prepares the Spark config, and the like.

  2. bootstrap-*_allday.log has a record for each service start and stop on that day.

  3. jupyter-*.log contains output from the Jupyter server. After the initializations from bootstrap-*.log are done, the Jupyter server is started. That's when this file is created. You'll see log entries when notebook kernels are started or stopped, and when a notebook is saved.

  4. monitor-*.log contains output from a monitoring script that is started with the service. The monitoring script has to detect on which port the Jupyter server is listening. Afterwards, it keeps an eye on service activity and shuts down the service when it's been idle too long.

  5. kernel-*-*.log contains output from notebook kernels. Every kernel gets a separate log file, the timestamp indicates when the kernel started. The second word in the filename indicates the type of kernel.

  6. spark*-ego.log contains output from Spark job scheduling. It's used by the monitoring script to detect whether Spark is active although the notebook kernels are idle.

  7. logs-*.tgz contains archived logs of the respective day. They'll be deleted automatically after a few days.

Roland Weber
  • 1,865
  • 2
  • 17
  • 27
1

With the recently enabled "environment" feature in DSX, the logs have moved to directory /var/pod/logs/. You will still see the kernel-*-*.log and jupyter-*.log files for your current session. However, they're not useful for debugging.

In the Spark as a Service backend, each kernel has a Spark driver process which logs to the kernel-*-*.log file. The environment feature comes without Spark, and the kernel itself does not generate output for the log file.

Roland Weber
  • 1,865
  • 2
  • 17
  • 27