3

We are using GridGain version : 8.8.10 and JDK : 11 .

Ignite cluster is created using the the instruction mentioned in : https://www.gridgain.com/docs/latest/installation-guide/kubernetes/gke-deployment

Ignite cluster is deployed in Kubernetes with native persistence enabled. We have 2 nodes in the cluster with both partitioned and replicated caches.

    <bean class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="peerClassLoadingEnabled" value="true"/>
        <property name="metricsLogFrequency" value="60000"/>
        <property name="sqlConfiguration">
            <bean class="org.apache.ignite.configuration.SqlConfiguration">
                <property name="sqlGlobalMemoryQuota" value="300M"/>
                <property name="sqlQueryMemoryQuota" value="30M"/>
                <property name="sqlOffloadingEnabled" value="true"/>
            </bean>
        </property>
        <property name="workDirectory" value="/gridgain/work"/>

        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
                <!-- set the size of wal segments to 128MB -->
                <property name="walSegmentSize" value="#{128 * 1024 * 1024}"/>
                <!-- Set the page size to 8 KB -->
                <property name="pageSize" value="#{8 * 1024}"/>
                <property name="defaultDataRegionConfiguration">
                    <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <property name="name" value="Default_Region"/>
                        <!-- Memory region of 20 MB initial size. -->
                        <property name="initialSize" value="#{20L * 1024 * 1024}"/>
                        <!-- Memory region of 100 MB max size. -->
                        <property name="maxSize" value="#{100L * 1024 * 1024}"/>
                        <!-- Enabling eviction for this memory region. -->
                        <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
                        <property name="persistenceEnabled" value="true"/>
                        <!-- Enabling SEGMENTED_LRU page replacement for this region.  -->
                        <property name="pageReplacementMode" value="SEGMENTED_LRU"/>
                    </bean>
                </property>
                <property name="dataRegionConfigurations">
                    <list>
                        <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                            <property name="name" value="abc_Region"/>
                            <!-- Memory region of 20 MB initial size. -->
                            <property name="initialSize" value="#{20 * 1024 * 1024}"/>
                            <!-- Maximum size is 2 GB  -->
                            <property name="maxSize" value="#{2L * 1024 * 1024 * 1024}"/>
                            <!-- Enabling eviction for this memory region. -->
                            <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
                            <property name="persistenceEnabled" value="true"/>
                            <!-- Enabling SEGMENTED_LRU page replacement for this region.  -->
                            <property name="pageReplacementMode" value="SEGMENTED_LRU"/>
                        </bean>
                    </list>
                </property>
                <property name="walPath" value="/gridgain/wal"/>
                <property name="walArchivePath" value="/gridgain/wal"/>
            </bean>

After some time the cluster is started we are observing the below error in Ignite cluster nodes log file.

[04:35:22,493][SEVERE][sys-#46][GridDhtAtomicCache] <abc_cache> Failed to send TTL update request.
org.apache.ignite.internal.processors.cache.GridCacheEntryRemovedException
    at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.checkObsolete(GridCacheMapEntry.java:2961)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.checkReadersLocked(GridDhtCacheEntry.java:709)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.checkReaders(GridDhtCacheEntry.java:685)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.readers(GridDhtCacheEntry.java:396)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$11.run(GridDhtCacheAdapter.java:1482)
    at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7283)
    at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:826)
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

Has anyone faced similar issues?

dassum
  • 4,727
  • 2
  • 25
  • 38
  • This looks like a network timeout/error. These are common to kubernetes environments. If this continues, I suggest you tune connection settings to fit your use case. see: https://ignite.apache.org/docs/latest/clustering/network-configuration#connection-timeouts – Alex K Dec 15 '21 at 15:46
  • I tried updating the timeout configuration ,but still facing the above issue. – dassum Dec 16 '21 at 05:41
  • You are giving too small a max size to the default data region. This could cause memory contention and affect overall system performance. I would increase this as much as possible. Make sure there is sufficient RAM allocated to your VM. This property is a last case resort -- remove it unless there is a strong case for it. Vary pageReplacementMode to see whether it makes a difference in your use case. – Alex K Dec 16 '21 at 15:40
  • We are using custom regions not the default region to create our caches. Our data size is not very large. We have enabled native persistence enabled. I was excepting the Query to return results will be slower if retrieving results from the disk and not memory pressure. – dassum Dec 20 '21 at 05:13
  • Try the recommended suggestions to rule out memory contention. Move everything to the default region, and increase its size. Change the memory page size, as this setting might not play well w/a cloud provider using SAN storage. – Alex K Dec 21 '21 at 00:49

0 Answers0