I am trying to configure wildfly 26.1 with High Availability as mentioned here with 2 servers, server A and server B and a Haproxy in front of them.
I'm using the default configuration for standalone-ha.xml
and everything is working fine. When the 2 servers are up and running and when server B is down the Haproxy send the request to server A and the user continues to work in server A without losing the session.
The problem occurs when Server A has a lot of sessions and server B is deploying the app and trying to get the current sessions from Server A.
The error is
2023-07-10 14:42:42,947 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 86) MSC000001: Failed to start service org.wildfly.clustering.infinispan.cache.web."app.war": org.jboss.msc.service.StartException in service org.wildfly.clustering.infinispan.cache.web."app.war": org.infinispan.commons.CacheException: Initial state transfer timed out for cache app.war on serverB
at org.wildfly.clustering.service@26.0.1.Final//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:66)
at org.wildfly.clustering.service@26.0.1.Final//org.wildfly.clustering.service.AsyncServiceConfigurator$AsyncService.lambda$start$0(AsyncServiceConfigurator.java:117)
at org.jboss.threads@2.4.0.Final//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads@2.4.0.Final//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1990)
at org.jboss.threads@2.4.0.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at org.jboss.threads@2.4.0.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
at java.base/java.lang.Thread.run(Thread.java:829)
at org.jboss.threads@2.4.0.Final//org.jboss.threads.JBossThread.run(JBossThread.java:513)
Caused by: org.infinispan.commons.CacheException: Initial state transfer timed out for cache app.war on serverB
at org.infinispan@12.1.7.Final//org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:249)
at org.infinispan@12.1.7.Final//org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1018)
at org.infinispan@12.1.7.Final//org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:512)
at org.infinispan@12.1.7.Final//org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:698)
at org.infinispan@12.1.7.Final//org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:644)
at org.infinispan@12.1.7.Final//org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:533)
at org.infinispan@12.1.7.Final//org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:511)
at org.jboss.as.clustering.infinispan@26.0.1.Final//org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:85)
at org.wildfly.clustering.infinispan.spi@26.0.1.Final//org.wildfly.clustering.infinispan.spi.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:77)
at org.wildfly.clustering.infinispan.spi@26.0.1.Final//org.wildfly.clustering.infinispan.spi.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:55)
at org.wildfly.clustering.service@26.0.1.Final//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:63)
... 7 more
This error happend when server want more than 4 minutes to take the session from the other server.I have read here that the default timeout is 4 minutes.
I also find that the default value in state transfer
is 4 minutes here.
But when I increased it, nothoing different happend.
This is my configuration at this time.
<subsystem xmlns="urn:jboss:domain:infinispan:13.0">
<cache-container name="ejb" default-cache="dist" marshaller="PROTOSTREAM" aliases="sfsb" modules="org.wildfly.clustering.ejb.infinispan">
<transport lock-timeout="960000"/>
<replicated-cache name="sso">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<expiration interval="0"/>
<state-transfer timeout="1360000"/>
</replicated-cache>
<distributed-cache name="dist">
<locking acquire-timeout="10000" isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<expiration interval="1000" lifespan="10000" max-idle="10000"/>
<file-store/>
</distributed-cache>
</cache-container>
.......
</subsystem>
Related post is that but it doesn't solve my problem