databricks Job cluster output Limits

Question

I am running a production job in databricks using cluster. During environment Initialization I have created a notebook which will include lot of print statements which is causing job cluster to exceed the output size and the job was failing.

I have tried to configure this parameter

spark.databricks.driver.disableScalaOutput true

Seems to be the above parameter does not working. Is there any other way we can tackle this issue.

score 1 · Answer 1 · answered Sep 16 '22 at 12:54

a notebook which will include lot of print statements which is causing job cluster to exceed the output size and the job was failing.

As per the given information I have tried to repro the issue by writing the 999 print per Notebook with total 3 notebooks. Means total 2997 print statements in a notebook and it is working fine for me. Please refer below image.

The possible reasons for the error could be:

If you are using multiple display(), displayHTML(), show() commands in your notebook, this increases the amount of output. Once the output exceeds 20 MB, the error occurs.
If you are using multiple print() commands in your notebook, this can increase the output to stdout. Once the output exceeds 20 MB, the error occurs.
If you are running a streaming job and enable awaitAnyTermination in the cluster’s Spark config, it tries to fetch the entire output in a single request. If this exceeds 20 MB, the error occurs.

So, make sure the total output shouldn’t exceed 20 MB.

Solution:

Remove any unnecessary display(), displayHTML(), print(), and show(), commands in your notebook. These can be useful for debugging, but they are not recommended for production jobs.

If your job output is exceeding the 20 MB limit, try redirecting your logs to log4j or disable stdout by setting spark.databricks.driver.disableScalaOutput true in the cluster’s Spark config.

have you tried with https://stackoverflow.com/questions/64161733/databricks-sql-how-to-get-all-the-rows-more-than-1000-in-the-first-run — Karthikeyan Rasipalay Durairaj, Sep 16 '22 at 14:08
@utkarsh Are you running this as a job ? I tried configuring with this parameter but still it is failing . — code_bug, Sep 17 '22 at 09:23
I'm just simply running the Notebook and tried to repro the issue. Running fine for me. Have you tried after dropping all `print()` statements? — Utkarsh Pal, Sep 19 '22 at 03:59
Runnig as a individual notebook wonot be an issue. but inside a job it will limit the output. — code_bug, Sep 21 '22 at 16:13

databricks Job cluster output Limits

1 Answers1