Spark cluster idle most of the time - Databricks

Question

i'm working on several jobs on Databricks, this is one of those, it reads data...and now I dont get what it's doing, no network, no cpu. The process reads data from a S3 mounted on DBFS, process it and store it on anoter S3 route.

Went into SQL on Spark UI and see this:

For ID 12

For ID 14 is no this step:

And in Active Stages here:

On those 10 min...an counting only this on the log: 22/02/10 19:24:20 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0

Any idea of how could I understand what's happening here? Thanks!!!

I'm also facing the same issue. Did you get any solution of it? — J_V, Jun 02 '22 at 07:33
for me it looks like some sort of timeout. I usually get this error for almost exact 120 seconds. After that i sometimes see logs like this: "INFO TransportClientFactory: Found inactive connection to /10.101.13.197:40759, creating a new one." — bastian, Dec 13 '22 at 14:49
It was a MERGE without partition pruning, my bad, looking at it now makes a lot of sense, thanks! — Alejandro, Jul 24 '23 at 15:17

score 0 · Answer 1 · answered Mar 28 '22 at 05:03

0

You can check the driver logs of spark - the stderr, stdout and log4j output. It could be a GC Allocation failure.

You can also check the executors tab in spark UI. It could give you insights on what part of the code is spark stuck on.

answered Mar 28 '22 at 05:03

Anmol Deep

463
1
5
16

Spark cluster idle most of the time - Databricks

1 Answers1