Flink cluster configuration issue - no slots available

Question

I have deployed Flink cluster with configuration for parallelism as follows:

jobmanager.heap.mb: 2048
taskmanager.heap.mb: 2048
taskmanager.numberOfTaskSlots: 5
parallelism.default: 2

But if I try to run any example or jar even with -p flag I receive the following error:

org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. 
Task to schedule: < Attempt #1 (Source: Custom Source -> Sink: Unnamed (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < 22f48c24254702e4d1674069e455c81a > in sharing group < SlotSharingGroup [22f48c24254702e4d1674069e455c81a] >. Resources available to scheduler: 
Number of instances=0, total number of slots=0, available slots=0
        at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:255)
        at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131)
        at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:303)
        at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:453)
        at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:326)
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:742)
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:889)
        at org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.call(FixedDelayRestartStrategy.java:80)
        at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Which should not come as a surprise, as dashboard shows:

I tried restarting a cluster for several times, but it seems not to use the configuration.

Do you see task managers in `Task Managers` section of main menu? Seems like you don't have running task managers or their ports blocked by firewall. So, try to check logs of task manager (in /log directory) and check firewall settings. — Maxim, Mar 25 '16 at 08:25
I actually don't see any managers in `Task Manager` section. The log for task manager displays the following error: `Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/flink/runtime/leaderretrieval/LeaderRetrievalListener : Unsupported major.minor version 51.0` — Tomasz Sosiński, Mar 25 '16 at 09:50
Which version of JRE/JDK are you using on task manager node? Seems like it is less then 7 (51 in [internal format](http://stackoverflow.com/a/11432195/2398521)). — Maxim, Mar 25 '16 at 10:18
It turned out I have Java 1.8 on my JobManager machine and 1.6 on TaskManager machines. After updating to java 1.8 on each machine the flink cluster works properly (thank you maxd). However, the Dashboard UI does not follow new cluster and its configuration, can I restart it anyhow? — Tomasz Sosiński, Mar 25 '16 at 10:47
Do you set valid IPs in `/conf/masters` and `/conf/slaves` files as [described in documentation](https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/setup_quickstart.html#cluster-setup)? — Maxim, Mar 25 '16 at 11:49
Ok, everything seems to be working, I still can't reach the UI, but when I `curl` from localhost it returns appropriate value. It seems I have an issue with routing iptables. Thank you for help, maxd! — Tomasz Sosiński, Mar 25 '16 at 14:13

score 4 · Answer 1 · edited Jun 25 '19 at 07:56

I got the same issue, and I remember when to Spark running into problems, that's because I newly installed JDK11 for tests, that change my env var JAVA_HOME to /Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home.

So I set JAVA_HOME back to JDK8 use: export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home

and everything runs smoothly. That path is for my Mac, you can find your own JAVA_HOME. Hope that will help.

score 0 · Answer 2 · answered Jun 04 '18 at 10:47

Exception simply means there is no Task manger hence no slots available to run the job. Reason for Task manager going done can be many e.g. an run time exception of miss configuration. Just check the logs for exact reason. You need to restart the cluster and when task managers are available in dashboard run the job again. You can have proper restart strategy defined in config like FIXED delay restart so that job will retry in case of genuine failure.

score 0 · Answer 3 · answered Jun 12 '20 at 09:06

Finally I got solution for this FLINK issue is my case. First I will explain the Root cause and then explain solution.

Root Cause: Could not create the Java Virtual Machine.

Please Check the Flink logs and tail the task-executor logs

tail -500f flink-root-taskexecutor-3-osboxes.out Found following logs.

Invalid maximum direct memory size: -XX:MaxDirectMemorySize=8388607T
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Why this is coming because Java version is not correct. OS is 64 bit based but I installed jdk 32 bits.

Solution: 1. Install correct JDK-1.8 64 bit [After installation error in task-executor got disappear]

Edit flink-conf.yaml file update taskmanager.numberOfTaskSlots: 10 parallelism.default: 1

My problem got resolved and Flink cluster running perfectly in local and on cloud.

Flink cluster configuration issue - no slots available

3 Answers3

Root Cause: Could not create the Java Virtual Machine.