2

I am working on setting up a replicated infinispan (9.4.16.Final, Wildfly 18.1) cache for two nodes (server1 and server2) and I am running into an issue where the initial state transfer times out on startup. This only happens when I am upgrading my application.

ERROR [2020-02-14 21:54:47,870] [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-13,ejb,server1) ISPN000474: Error processing request 2017@server2
ERROR [2020-02-14 21:54:47,876] [thread-13,ejb,server1] [transport.jgroups.JGroupsTransport] [] - ISPN000474: Error processing request 2020@server2

ERROR [2020-02-14 20:49:54,732] [org.jboss.msc.service.fail] (ServerService Thread Pool -- 90) MSC000001: Failed to start service org.wildfly.clustering.infinispan.cache.mycontainer.mycache: org.jboss.msc.service.StartException in service org.wildfly.clustering.infinispan.cache.mycontainer.mycache: org.infinispan.commons.CacheException: Initial state transfer timed out for cache mycache on server1
    at org.wildfly.clustering.service@18.0.1.Final//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:70)
    at org.wildfly.clustering.service@18.0.1.Final//org.wildfly.clustering.service.AsyncServiceConfigurator$AsyncService.lambda$start$0(AsyncServiceConfigurator.java:117)
    at org.jboss.threads@2.3.3.Final//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
    at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
    at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
    at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
    at java.base/java.lang.Thread.run(Thread.java:834)
    at org.jboss.threads@2.3.3.Final//org.jboss.threads.JBossThread.run(JBossThread.java:485)
Caused by: org.infinispan.commons.CacheException: Initial state transfer timed out for cache mycache on server1
    at org.infinispan@9.4.16.Final//org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:238)
    at org.infinispan@9.4.16.Final//org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1113)
    at org.infinispan@9.4.16.Final//org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:511)
    at org.infinispan@9.4.16.Final//org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:657)
    at org.infinispan@9.4.16.Final//org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:601)
    at org.infinispan@9.4.16.Final//org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:484)
    at org.infinispan@9.4.16.Final//org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:468)
    at org.infinispan@9.4.16.Final//org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:454)
    at org.jboss.as.clustering.infinispan@18.0.1.Final//org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:83)
    at org.wildfly.clustering.infinispan.spi@18.0.1.Final//org.wildfly.clustering.infinispan.spi.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:77)
    at org.wildfly.clustering.infinispan.spi@18.0.1.Final//org.wildfly.clustering.infinispan.spi.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:55)
    at org.wildfly.clustering.service@18.0.1.Final//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:67)
    ... 7 more

ERROR [2020-02-14 21:54:47,864] [Controller Boot Thread] [as.controller.management-operation] [] - WFLYCTL0013: Operation ("add") failed - address: ([
    ("subsystem" => "infinispan"),
    ("cache-container" => "mycontainer"),
    ("replicated-cache" => "mycache"),
    ("component" => "backups")
]) - failure description: {"WFLYCTL0080: Failed services" => {"org.wildfly.clustering.infinispan.cache.mycontainer.mycache" => "org.infinispan.commons.CacheException: Initial state transfer timed out for cache mycache on server1
    Caused by: org.infinispan.commons.CacheException: Initial state transfer timed out for cache mycache on server1"}}

My configuration:

<cache-container name="mycontainer">
    <transport/>
    <replicated-cache name="mycache">
        <locking acquire-timeout="30000" isolation="REPEATABLE_READ"/>
        <expiration interval="60000" lifespan="1200000" max-idle="-1"/>
        <file-store/>
    </replicated-cache>
</cache-container>

Can someone help? I've spent several hours trying to fix this but haven't had any luck. Thanks!

Steve
  • 31
  • 1
  • 2
  • Hi Steve, what do you mean by "happens when I am upgrading my application" ? – Diego Feb 15 '20 at 11:33
  • Hi Diego. It's when I'm upgrading to a new build (i.e. new ear version but the infinispan version stays the same). – Steve Feb 16 '20 at 18:56

3 Answers3

2

You have two options:

1- Increase the timeout

OR

2- Set await-initial-transfer="false"

    <replicated-cache name="mycache">
        <locking acquire-timeout="30000" isolation="REPEATABLE_READ"/>
        <expiration interval="60000" lifespan="1200000" max-idle="-1"/>
        <file-store/>
<state-transfer enabled="true" timeout="60000" await-initial-transfer="false"/>
    </replicated-cache>
    ```
Diego
  • 413
  • 3
  • 14
  • Thanks for your answer. I have set await-initial-transfer=false and the server starts now, which it wasn't doing before. – Steve Feb 18 '20 at 04:01
  • Now, the problem is that, when I do a rolling upgrade on "server1" and "server2", the cache entries are not able to be replicated during the period that the servers are running a different version. Caused by: org.jboss.modules.ModuleNotFoundException: deployment.myapplication-1.0.ear In this example, "server2" is giving that error and "server1" is running deployment.myapplication-1.1.ear (after being upgraded). This results in cache entries being lost during upgrades. Am I trying to do something infinispan does not support? This wasn't a problem with jboss cache. – Steve Feb 18 '20 at 04:13
  • Hi Steve, what do you mean by "servers are running a different version" ? Also, " This results in cache entries being lost during upgrades". Do you mean that the cache entry will be null and later it will have the latest value? – Diego Feb 18 '20 at 05:57
  • Hmm, what is the cache size in MB? – Diego Feb 18 '20 at 06:09
  • Hi Diego. Sorry I am not explaining this well. The servers are running a different versions of the deployed ear as I am doing a rolling upgrade to keep one of them available at all times. The code hasn't changed but it has been recompiled and has a slightly newer version (myapplication-1.0.0.0.ear -> myapplication-1.0.0.1.ear). For some reason, infinispan doesn't like that the ears have different versions and fails to replicate cache entries. The caches are not large and I am running into this problem when they have fewer than 10 entries. – Steve Feb 18 '20 at 15:22
  • Basically, what I want is for infinispan to not fail to replicate when the version of the ear changes during upgrades. FYI, the class actually being stored has a serialVersionUID which does not change. – Steve Feb 18 '20 at 15:23
0

I solved this by removing the version from the ear that gets deployed. Infinispan is unable (probably by design) to replicate cache entries when the module names do not match exactly. Removing the version is not an ideal solution in my opinion but it gets the job done. The initial state transfer timeout error stopped happening as soon as the version was removed.

Steve
  • 31
  • 1
  • 2
0

With infinispan cluster, there is one more scenario to look out in such situation especially on cache org.infinispan.CONFIG. Once there is timeout on state transfer of the cache org.infinispan.CONFIG. Very likely there is node in cluster which is causing that. Try to find which node is failing to request state of cache org.infinispan.CONFIG, and restart that node. Most likely there will be warning on org.infinispan.statetransfer.InboundTransferTask.

YS_NE
  • 194
  • 2
  • 21