I have a kafka consumer application which reads IoT data from a kafka topic. But still i get the following errors/warnings erratically.
logs
2019-06-04T23:21:03.58+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:51:03.583 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053164658: The coordinator is not aware of this member.
2019-06-04T23:21:03.58+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:51:03.583 WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Asynchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053164658, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:21:03.58+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:51:03.583 WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Synchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053167516, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
I have already tried out multiple combinations of the max.poll.records and max.poll.interval.ms configurations. I even tried increasing request.timeout.ms, but still these errors and warning doesn't stop.
Note: I do not have control over the broker, hence I cannot try changing session.timeout.ms as it needs to be within the range of group.min.session.timeout.ms and group.max.session.timeout.ms configuration of the broker.
application.yml
spring:
kafka:
consumer:
group-id: iot
auto-offset-reset: earliest
properties:
fetch.max.wait.ms: 10000
fetch.min.bytes: 30000000
retry.backoff.ms: 1000
max.poll.records: 4000000
max.poll.interval.ms: 720000
request.timeout.ms: 900000
Currently the behaviour is as erratic as follows.
logs
2019-06-04T23:08:03.82+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:38:03.827 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053064069: The coordinator is not aware of this member.
2019-06-04T23:08:03.82+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:38:03.827 WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Asynchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053064069, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:08:03.82+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:38:03.827 WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Synchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053066926, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:09:43.04+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:39:43.044 WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Synchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053064069, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:10:00.13+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:00.130 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 119. No Of measures: 2857
2019-06-04T23:10:12.90+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:12.909 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 125. No Of measures: 2893
2019-06-04T23:10:22.94+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:22.948 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 74. No Of measures: 2880
2019-06-04T23:10:34.44+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:34.445 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 73. No Of measures: 2862
2019-06-04T23:10:50.50+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:50.501 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 73. No Of measures: 2866
2019-06-04T23:10:56.08+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:56.086 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053075561: The coordinator is not aware of this member.
2019-06-04T23:10:56.08+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:56.086 WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Asynchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053075561, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:10:56.08+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:56.086 WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Synchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053078427, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:11:03.86+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:41:03.867 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053078427: The coordinator is not aware of this member.
2019-06-04T23:11:05.50+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:41:05.506 WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Asynchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053078427, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:11:33.74+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:41:33.743 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 79. No Of measures: 2862
2019-06-04T23:11:45.66+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:41:45.664 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 109. No Of measures: 2866
2019-06-04T23:11:56.49+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:41:56.492 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 75. No Of measures: 2880
2019-06-04T23:12:08.39+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:42:08.390 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 90. No Of measures: 2889
2019-06-04T23:12:15.71+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:42:15.716 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053078427: The request timed out.
2019-06-04T23:12:25.00+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:42:25.001 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 80. No Of measures: 2880
2019-06-04T23:12:43.71+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:42:43.714 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 97. No Of measures: 2870
2019-06-04T23:13:02.37+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:43:02.374 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 121. No Of measures: 2868
2019-06-04T23:13:21.72+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:43:21.724 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 99. No Of measures: 2867
2019-06-04T23:13:42.36+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:43:42.368 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 101. No Of measures: 2860
2019-06-04T23:14:01.73+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:44:01.737 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 145. No Of measures: 2862
2019-06-04T23:14:19.28+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:44:19.287 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 118. No Of measures: 2873
2019-06-04T23:14:37.63+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:44:37.630 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 104. No Of measures: 2866
2019-06-04T23:14:55.88+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:44:55.889 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 117. No Of measures: 2880
2019-06-04T23:15:12.29+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:45:12.298 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 203. No Of measures: 2880
2019-06-04T23:15:31.48+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:45:31.480 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 105. No Of measures: 2880
2019-06-04T23:15:51.25+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:45:51.251 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 176. No Of measures: 2880
2019-06-04T23:16:06.69+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:46:06.692 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 157. No Of measures: 2880
2019-06-04T23:16:23.27+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:46:23.271 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 110. No Of measures: 2880
2019-06-04T23:16:39.18+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:46:39.184 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 88. No Of measures: 2880
2019-06-04T23:16:58.28+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:46:58.285 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 108. No Of measures: 2880
2019-06-04T23:17:17.67+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:47:17.676 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 141. No Of measures: 2885
2019-06-04T23:17:36.67+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:47:36.669 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 107. No Of measures: 2880
2019-06-04T23:17:53.78+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:47:53.783 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 344. No Of measures: 2855
2019-06-04T23:18:12.35+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:48:12.351 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 67. No Of measures: 2880
2019-06-04T23:18:29.12+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:48:29.129 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 109. No Of measures: 2895
2019-06-04T23:18:46.31+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:48:46.313 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 131. No Of measures: 2861
2019-06-04T23:19:03.72+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:49:03.729 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 116. No Of measures: 2880
2019-06-04T23:19:22.91+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:49:22.913 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 113. No Of measures: 2867
2019-06-04T23:19:40.83+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:49:40.832 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 118. No Of measures: 2859
2019-06-04T23:19:58.58+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:49:58.587 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 106. No Of measures: 2880
2019-06-04T23:20:16.08+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:50:16.086 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 89. No Of measures: 2880
2019-06-04T23:20:35.23+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:50:35.239 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 163. No Of measures: 2854
2019-06-04T23:20:55.44+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:50:55.446 INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl : Time taken(ms) 214. No Of measures: 2858
2019-06-04T23:21:03.58+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:51:03.583 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053164658: The coordinator is not aware of this member.
Any advice to resolve this problem is highly appreciated.
P.S.: Should I consider to isolate the consumer listener tread from the processing of the data with that of worker threads as pointed out in the this question?