Apache Ignite data rebalancing problem (Will not own partition (there are owners to rebalance from)

Question

I updated Apache Ignite to version 2.15.0
Everything is fine when I launch the first node, it works, the client connects to it.
But when I launch the second node, an error occurs at the balancing stage, as far as I understand Messages like this start to appear in the logs:

[2023-07-25T12:50:39,248][DEBUG][exchange-worker-#50][GridDhtPartitionTopologyImpl] Will not own partition (there are owners to rebalance from) [grp=CustomCacheItem, p=3, owners = [TcpDiscoveryNode [id=2cb9ce8d-2025-456d-82a6-c761ef7a98b7, consistentId=1NodeHostname, addrs=ArrayList [20.20.28.215, 127.0.0.1], sockAddrs=HashSet [1NodeHostname/20.20.28.215:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1690278575122, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, isClient=false]]]

Then there are these errors appear:

[2023-07-25T12:50:55,935][ERROR][tcp-disco-msg-worker-[2cb9ce8d 20.20.28.215:47500]-#2-#44][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=partition-exchanger, threadName=exchange-worker-#50, blockedFor=16s]

And second node is shutting down

My questions:

Does the message "Will not own partition (there are owners to rebalance from)" relate to the cause of the second node crash?
If yes, what should I do? What is this message about?
How to understand what is causing the message "Blocked system-critical thread has been detected"

It seems that the problem occurs during rebalancing. But what exactly is the problem?

You should take a thread dump from the failed node. For some reason PME is stuck on the second node. It might be related to an ongoing transaction or something else. — Alexandr Shapkin, Jul 25 '23 at 11:25
Partition Map Exchange, https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exchange+-+under+the+hood — Alexandr Shapkin, Jul 25 '23 at 11:46
Took a thread dump immediately after the first error message appears: https://drive.google.com/file/d/1E2lMLUf8d8mkN0tRSP97GTrsZ3omddId/view?usp=sharing — Rodriguez, Jul 25 '23 at 18:02

Apache Ignite data rebalancing problem (Will not own partition (there are owners to rebalance from)

0 Answers0