0

I'm running 3 node embedded cluster of Hazelcast-Jet and the following error is frequently seen in the console. What could be the possible reason?

 [jet] [3.0] Execution of job '15ba-4fbe-1b73-9ed1', execution 61d7-46eb-5875-8799 failed after 60,112 ms
    com.hazelcast.jet.JetException: Exception in ProcessorTasklet{streamKafka#1}: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
            at com.hazelcast.jet.impl.execution.TaskletExecutionService$BlockingWorker.run(TaskletExecutionService.java:250)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            at ------ submitted from ------.(Unknown Source)
            at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:126)
            at com.hazelcast.spi.impl.AbstractInvocationFuture$1.run(AbstractInvocationFuture.java:251)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
            at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
            at ------ submitted from ------.(Unknown Source)
            at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:126)
            at com.hazelcast.spi.impl.AbstractInvocationFuture$1.run(AbstractInvocationFuture.java:251)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
            at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
    Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata

May 14, 2019 5:36:13 AM com.hazelcast.jet.impl.MasterJobContext
SEVERE: [127.0.0.1]:5701 [jet] [3.0] Execution of job '4940-dffe-4fd6-2f43', execution 2b9a-1f3d-4ecc-e116 failed after 60,209 ms
com.hazelcast.jet.JetException: Exception in ProcessorTasklet{streamKafka#1}: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
        at com.hazelcast.jet.impl.execution.TaskletExecutionService$BlockingWorker.run(TaskletExecutionService.java:250)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
        at ------ submitted from ------.(Unknown Source)
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:126)
        at com.hazelcast.spi.impl.AbstractInvocationFuture$1.run(AbstractInvocationFuture.java:251)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata

Can someone help me understand?

Data from Kafka-source and sink is not consistent too.

srikanth
  • 958
  • 16
  • 37
  • Looks like `KafkaConsumer` fails to get the metadata, the stack trace of the *Cause* is missing. Issue is likely not related to Jet. This might be an answer: https://stackoverflow.com/a/55277943/952135, looks like the issue is often filed with many systems using kafka – Oliv May 13 '19 at 14:41
  • @Oliv Not much information though, updated the question with full stack trace. – srikanth May 14 '19 at 06:09
  • The stack trace for the cause is still missing. I mean the lines after `Caused by: org.apache.kafka.common.errors.TimeoutException` – Oliv May 15 '19 at 07:13
  • @OlivThat's what it is displayed in the console. – srikanth May 15 '19 at 08:56
  • Looked at other reports, in all of them the stacktrace is missing for this exception. It's weird. – Oliv May 16 '19 at 08:37

1 Answers1

0

This is very likely a configuration error. From reading multiple reports of this issue it can be wrong broker URL, incorrect SSH configuration, network failure or similar. Kafka client doesn't report connection errors immediately but retries to connect, until it eventually times out.

You should have other logs from the Kafka client that should repeatedly show the cause, make sure you have Kafka logging enabled.

[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Connection to node -1 (/127.0.0.1:55561) could not be established. Broker may not be available.
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Connection to node -1 (/127.0.0.1:55561) could not be established. Broker may not be available.
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Connection to node -1 (/127.0.0.1:55561) could not be established. Broker may not be available.
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Connection to node -1 (/127.0.0.1:55561) could not be established. Broker may not be available.
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Connection to node -1 (/127.0.0.1:55561) could not be established. Broker may not be available.
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Connection to node -1 (/127.0.0.1:55561) could not be established. Broker may not be available.
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Connection to node -1 (/127.0.0.1:55561) could not be established. Broker may not be available.
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Connection to node -1 (/127.0.0.1:55561) could not be established. Broker may not be available.
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Connection to node -1 (/127.0.0.1:55561) could not be established. Broker may not be available.
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Connection to node -1 (/127.0.0.1:55561) could not be established. Broker may not be available.
... lot more of these

10:40:40,681 DEBUG || - [JobExecutionService] hz._hzInstance_1_jet.jet.blocking.thread-2 - [127.0.0.1]:5701 [jet] [3.1-SNAPSHOT] Execution of job '336c-68c3-7b9d-0c2f', execution 5c43-a795-8172-bc57 completed with failure
java.util.concurrent.CompletionException: com.hazelcast.jet.JetException: Exception in ProcessorTasklet{writeKafka(7f147d66-3952-4e86-980a-226cc8e6ac9b)#1}: org.apache.kafka.common.errors.TimeoutException: Topic 7f147d66-3952-4e86-980a-226cc8e6ac9b not present in metadata after 60000 ms.
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
    at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
    at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at com.hazelcast.jet.impl.util.NonCompletableFuture.internalCompleteExceptionally(NonCompletableFuture.java:59)
    at com.hazelcast.jet.impl.execution.TaskletExecutionService$ExecutionTracker.taskletDone(TaskletExecutionService.java:398)
    at com.hazelcast.jet.impl.execution.TaskletExecutionService$BlockingWorker.run(TaskletExecutionService.java:255)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
    at java.util.concurrent.FutureTask.run(FutureTask.java)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.hazelcast.jet.JetException: Exception in ProcessorTasklet{writeKafka(7f147d66-3952-4e86-980a-226cc8e6ac9b)#1}: org.apache.kafka.common.errors.TimeoutException: Topic 7f147d66-3952-4e86-980a-226cc8e6ac9b not present in metadata after 60000 ms.
    at com.hazelcast.jet.impl.execution.TaskletExecutionService$BlockingWorker.run(TaskletExecutionService.java:250)
    ... 6 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Topic 7f147d66-3952-4e86-980a-226cc8e6ac9b not present in metadata after 60000 ms.
Oliv
  • 10,221
  • 3
  • 55
  • 76