7

When restarting spark after a long period of inactivity ( 3day ).

val ssc = StreamingContext.getOrCreate(checkpointDir, newStreamingContext _, createOnError = createOnError)

I see that the restart is painful.

The tab streaming takes 45minutes to appear, that means "spark has finish the loading of checkpoint". ( it's quite long to load the last batch from a checkpoint file )

After that it shows 1000 batches with 0 event. When I restart just after a few minutes it shows just batches it missed ( 10 batches of 30s when the down period is about 5 minutes) and it loads "quickly".

So this makes me think loading my checkpoint takes time because it loads these 1000 batches.

Because 1000 batches of 30s do not match 3 day, I wonder what happens when these 1000 batches are finished, will it restart at the current time or load other missed batches? Is this 1000 limit configurable?

edit : After these 1000 batches nothing happens, no new batches are created by the direct kafka. I think it's not the feature expected, I hesitate to make a spark jira ticket about this.


Because problems do not come alone I think these 1000 batches are loaded in driver memory.

There is sometime a OOM after some batches. And when it doesn't, I see my Total Delay raising while the average processing time is below the batch time. This makes me think my driver is almost OOM and have difficulties to send batches to executors.

of course when my stream is not created from checkpoint every thing work well. So ? What happens when a stream start from checkpoint ?


ps : 0 events batch contains events because they take as much time as my usual full batches, and I see kafka offset increasing, so I think is a display bug of spark UI.

Community
  • 1
  • 1
crak
  • 1,635
  • 2
  • 17
  • 33
  • 1
    Interesting - I will watch this for answers. We are also running a streaming application written in Spark 2.0, but on AWS EMR, and loading from checkpoints is always painfully slow - even if we restart right away. – Glennie Helles Sindholt Sep 20 '16 at 07:36
  • Yes Spark UI shows 0 records after restart but active records are being processed - its a restart display problem on UI. Rest assured it has picked up from checkpoints and processing. – Steven Park Mar 04 '19 at 19:22

0 Answers0