0

I have spark streaming job running and the streaming input is around 50 mb every 3 hrs. The job processed few files in first few hrs. But suddenly failed with the following error. When the error occurred no input was received. The spark job was not able to create new thread.

I have cached the RDDs in the business logic - But that should not be a problem as a new thread will be created for every new input files. So the cached RDD would be destroyed when the thread ends.

Can anyone could help me on this? I have tried a lot but could not guess the issue.

Error Message:

17/12/21 15:32:31 INFO ContextCleaner: Cleaned RDD 9612
17/12/21 15:32:32 INFO CheckpointWriter: Saving checkpoint for time 1513869975000 ms to file 'hdfs://EAPROD/EA/supplychain/process/checkpoints/logistics/elf/eventsCheckpoint/checkpoint-1513869990000'
Exception in thread "dispatcher-event-loop-12" Exception in thread "dispatcher-event-loop-31" java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:714)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
        at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1018)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-28-thread-1" 17/12/21 15:32:32 INFO CheckpointWriter: Submitted checkpoint of time 1513869975000 ms writer queue
Sankarlal
  • 1
  • 4

1 Answers1

0

Monitor your application closely, using the Linux toolstack. This is a case of a user/system limit being enforced by the Linux kernel and killing your process, since the number of open threads has been exceeded. You can increase that limit, but potentially you are also leaking threads in your code.

See for example this answer on how to manage the system and user variables.

Rick Moritz
  • 1,449
  • 12
  • 25
  • Thanks for the input. I got that the total thread is exceeding the limit. But, will the streaming job keep increases the threads? And old threads which are executed also getting held? – Sankarlal Dec 22 '17 at 12:13
  • That is for your to find out, and depends on your logic. I amended a means of monitoring to my answer. – Rick Moritz Dec 22 '17 at 13:15