I have set up a 3 node Apache Ignite cluster and noticed the following unexpected behavior:
(Tested with Ignite 2.10 and 2.13, Azul Java 11.0.13 on RHEL 8)
We have a relational table "RELATIONAL_META". It's created by our software vendors product that uses Ignite to exchange configuration data. This table is backed by this cache, that gets replicated to all nodes:
[cacheName=SQL_PUBLIC_RELATIONAL_META, cacheId=-252123144, grpName=null, grpId=-252123144, prim=512, mapped=512, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, affCls=RendezvousAffinityFunction]
Seen behavior:
I did a failure test, simulating a disk failure of one of the Ignite nodes. The "failed" node restarts with an empty disk and joins the topology as expected. While the node is not yet part of the baseline nodes, either because auto-adjust is disabled, or auto-adjust did not yet complete, the restarted node returns empty results via the JDBC connection:
0: jdbc:ignite:thin://b2bivmign2/> select * from RELATIONAL_META;
+------------+--------------+------+-------+---------+
| CLUSTER_ID | CLUSTER_TYPE | NAME | VALUE | DETAILS |
+------------+--------------+------+-------+---------+
+------------+--------------+------+-------+---------+
No rows selected (0.018 seconds)
It's interesting that it knows the structure of the table, but not the contained data.
The table actually contains data, as I can see when I query against one of the other cluster nodes:
0: jdbc:ignite:thin://b2bivmign1/> select * from RELATIONAL_META;
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
| CLUSTER_ID | CLUSTER_TYPE | NAME | VALUE | DETAILS |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
| cluster_configuration_1 | writer | change index | 1653 | 2023-01-24 10:25:27 |
| cluster_configuration_1 | writer | last run changes | 0 | Updated at 2023-01-29 11:08:48. |
| cluster_configuration_1 | writer | require full sync | false | Flag set to false on 2022-06-11 09:46:45 |
| cluster_configuration_1 | writer | schema version | 1.4 | Updated at 2022-06-11 09:46:25. Previous version was 1.3 |
| cluster_processing_1 | reader | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:50] |
| cluster_processing_1 | reader | change index | 1653 | 2023-01-29 10:20:39 |
| cluster_processing_1 | reader | conflicts | 0 | Reset due to full sync at 2022-06-11 09:50:12 |
| cluster_processing_1 | reader | require full sync | false | Cleared the flag after full reader sync at 2022-06-11 09:50:12 |
| cluster_processing_2 | reader | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:43] |
| cluster_processing_2 | reader | change index | 1653 | 2023-01-29 10:24:06 |
| cluster_processing_2 | reader | conflicts | 0 | Reset due to full sync at 2022-06-11 09:52:19 |
| cluster_processing_2 | reader | require full sync | false | Cleared the flag after full reader sync at 2022-06-11 09:52:19 |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
12 rows selected (0.043 seconds)
Expected behavior:
While a node is not part of the baseline, it is per definition not persisting data. So when I run a query against it, I would expect it to fetch the partitions that it does not hold itself, from the other nodes of the cluster. Instead it just shows an empty result, even showing the correct structure of the table, just without any rows. This has caused inconsistent behavior in the product we're actually running, that uses Ignite as a configuration store, because suddenly the nodes see different results depending on which node they have opened their JDBC connection to. We are using a JDBC connection string that contains all the Ignite server nodes, so it fails over when one goes down, but of course it does not prevent the issue I have described here.
Is this "works a designed"? Is there any way to prevent such issues? It seems to be problematic to use Apache Ignite as a configuration store for an application with many nodes, when it behaves like this.
Regards, Sven
Update:
After restarting one of the nodes with an empty disk, it joins as a node with a new ID. I think that is expected behavior. We have enabled baseline auto-adjust, so the new node id should join the baseline, and old one should leave the baseline. This works, but before this is completed, the node returns empty results to SQL queries.
Cluster state: active
Current topology version: 95
Baseline auto adjustment enabled: softTimeout=60000
Baseline auto-adjust is in progress
Current topology version: 95 (Coordinator: ConsistentId=cdf43fef-deb8-4732-907f-6264bd55de6f, Address=b2bivmign3.fritz.box/192.168.0.151, Order=11)
Baseline nodes:
ConsistentId=3ffe3798-9a63-4dc7-b7df-502ad9efc76c, Address=b2bivmign1.fritz.box/192.168.0.149, State=ONLINE, Order=64
ConsistentId=40a8ae8c-5f21-4f47-8f67-2b68f396dbb9, State=OFFLINE
ConsistentId=cdf43fef-deb8-4732-907f-6264bd55de6f, Address=b2bivmign3.fritz.box/192.168.0.151, State=ONLINE, Order=11
--------------------------------------------------------------------------------
Number of baseline nodes: 3
Other nodes:
ConsistentId=080fc170-1f74-44e5-8ac2-62b94e3258d9, Order=95
Number of other nodes: 1
Update 2:
This is the JDDB URL the application uses:
#distributed.jdbc.url - run configure to modify this property
distributed.jdbc.url=jdbc:ignite:thin://b2bivmign1.fritz.box:10800..10820,b2bivmign2.fritz.box:10800..10820,b2bivmign3.fritz.box:10800..10820
#distributed.jdbc.driver - run configure to modify this property
distributed.jdbc.driver=org.apache.ignite.IgniteJdbcThinDriver
We have seen it connecting via JDBC to a node that was not part of the baseline and therefore receiving empty results. I wonder why a node that is not part of the baseline returns any results without fetching the data from the baseline nodes?
Update 3:
It seems to be dependent on the tables/caches attributes wether this happens, I cannot yet reproduce it with a table I create on my own, only with the table that is created by the product we use.
This is the cache of the table that I can reproduce the issue with:
[cacheName=SQL_PUBLIC_RELATIONAL_META, cacheId=-252123144, grpName=null, grpId=-252123144, prim=512, mapped=512, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, affCls=RendezvousAffinityFunction]
I have created 2 tables my own for testing:
CREATE TABLE Test (
Key CHAR(10),
Value CHAR(10),
PRIMARY KEY (Key)
) WITH "BACKUPS=2";
CREATE TABLE Test2 (
Key CHAR(10),
Value CHAR(10),
PRIMARY KEY (Key)
) WITH "BACKUPS=2,atomicity=ATOMIC";
I then shut down one of the Ignite nodes, in this case b2bivmign3, and remove the ignite data folders, then start it again. It starts as a new node that is not part of the baseline, and I disabled auto-adjust to just keep that situation. I then connect to b2bivmign3 with the SQL CLI and query the tables:
0: jdbc:ignite:thin://b2bivmign3/> select * from Test;
+------+-------+
| KEY | VALUE |
+------+-------+
| Sven | Demo |
+------+-------+
1 row selected (0.202 seconds)
0: jdbc:ignite:thin://b2bivmign3/> select * from Test2;
+------+-------+
| KEY | VALUE |
+------+-------+
| Sven | Demo |
+------+-------+
1 row selected (0.029 seconds)
0: jdbc:ignite:thin://b2bivmign3/> select * from RELATIONAL_META;
+------------+--------------+------+-------+---------+
| CLUSTER_ID | CLUSTER_TYPE | NAME | VALUE | DETAILS |
+------------+--------------+------+-------+---------+
+------------+--------------+------+-------+---------+
No rows selected (0.043 seconds)
The same when I connect to one of the other Ignite nodes:
0: jdbc:ignite:thin://b2bivmign2/> select * from Test;
+------+-------+
| KEY | VALUE |
+------+-------+
| Sven | Demo |
+------+-------+
1 row selected (0.074 seconds)
0: jdbc:ignite:thin://b2bivmign2/> select * from Test2;
+------+-------+
| KEY | VALUE |
+------+-------+
| Sven | Demo |
+------+-------+
1 row selected (0.023 seconds)
0: jdbc:ignite:thin://b2bivmign2/> select * from RELATIONAL_META;
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
| CLUSTER_ID | CLUSTER_TYPE | NAME | VALUE | DETAILS |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
| cluster_configuration_1 | writer | change index | 1653 | 2023-01-24 10:25:27 |
| cluster_configuration_1 | writer | last run changes | 0 | Updated at 2023-01-29 11:08:48. |
| cluster_configuration_1 | writer | require full sync | false | Flag set to false on 2022-06-11 09:46:45 |
| cluster_configuration_1 | writer | schema version | 1.4 | Updated at 2022-06-11 09:46:25. Previous version was 1.3 |
| cluster_processing_1 | reader | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:50] |
| cluster_processing_1 | reader | change index | 1653 | 2023-01-29 10:20:39 |
| cluster_processing_1 | reader | conflicts | 0 | Reset due to full sync at 2022-06-11 09:50:12 |
| cluster_processing_1 | reader | require full sync | false | Cleared the flag after full reader sync at 2022-06-11 09:50:12 |
| cluster_processing_2 | reader | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:43] |
| cluster_processing_2 | reader | change index | 1653 | 2023-01-29 10:24:06 |
| cluster_processing_2 | reader | conflicts | 0 | Reset due to full sync at 2022-06-11 09:52:19 |
| cluster_processing_2 | reader | require full sync | false | Cleared the flag after full reader sync at 2022-06-11 09:52:19 |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
12 rows selected (0.032 seconds)
I will test more tomorrow the find out which attribute of the table/cache enables this issue.
Update 4:
I can reproduce this with a table that is set to mode=REPLICATED instead of PARTITIONED.
CREATE TABLE Test (
Key CHAR(10),
Value CHAR(10),
PRIMARY KEY (Key)
) WITH "BACKUPS=2";
[cacheName=SQL_PUBLIC_TEST, cacheId=-2066189417, grpName=null, grpId=-2066189417, prim=1024, mapped=1024, mode=PARTITIONED, atomicity=ATOMIC, backups=2, affCls=RendezvousAffinityFunction]
CREATE TABLE Test2 (
Key CHAR(10),
Value CHAR(10),
PRIMARY KEY (Key)
) WITH "BACKUPS=2,TEMPLATE=REPLICATED";
[cacheName=SQL_PUBLIC_TEST2, cacheId=372637563, grpName=null, grpId=372637563, prim=512, mapped=512, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, affCls=RendezvousAffinityFunction]
0: jdbc:ignite:thin://b2bivmign2/> select * from TEST;
+------+-------+
| KEY | VALUE |
+------+-------+
| Sven | Demo |
+------+-------+
1 row selected (0.06 seconds)
0: jdbc:ignite:thin://b2bivmign2/> select * from TEST2;
+-----+-------+
| KEY | VALUE |
+-----+-------+
+-----+-------+
No rows selected (0.014 seconds)
Testing with Visor:
It makes no difference where I run Visor, same results.
We see both caches for the tables have 1 entry: +-----------------------------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+-----------+ | SQL_PUBLIC_TEST(@c9) | PARTITIONED | 3 | 1 (0 / 1) | min: 0 (0 / 0) | min: 0 | min: 0 | min: 0 | min: 0 | | | | | | avg: 0.33 (0.00 / 0.33) | avg: 0.00 | avg: 0.00 | avg: 0.00 | avg: 0.00 | | | | | | max: 1 (0 / 1) | max: 0 | max: 0 | max: 0 | max: 0 | +-----------------------------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+-----------+ | SQL_PUBLIC_TEST2(@c10) | REPLICATED | 3 | 1 (0 / 1) | min: 0 (0 / 0) | min: 0 | min: 0 | min: 0 | min: 0 | | | | | | avg: 0.33 (0.00 / 0.33) | avg: 0.00 | avg: 0.00 | avg: 0.00 | avg: 0.00 | | | | | | max: 1 (0 / 1) | max: 0 | max: 0 | max: 0 | max: 0 | +-----------------------------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+-----------+
One is empty when I scan it, the other has one row as expected:
visor> cache -scan -c=@c9
Entries in cache: SQL_PUBLIC_TEST
+================================================================================================================================================+
| Key Class | Key | Value Class | Value |
+================================================================================================================================================+
| java.lang.String | Sven | o.a.i.i.binary.BinaryObjectImpl | SQL_PUBLIC_TEST_466f2363_47ed_4fba_be80_e33740804b97 [hash=-900301401, VALUE=Demo] |
+------------------------------------------------------------------------------------------------------------------------------------------------+
visor> cache -scan -c=@c10
Cache: SQL_PUBLIC_TEST2 is empty
visor>
Update 5:
I have reduced the configuration file to this:
I did not manage to reproduce this with persistence turned off, as I don't manage to keep a node out the baseline then, it always joins immediately. So maybe this problem is only reproducible with persistence enabled.
I go to each of the 3 nodes, remove the Ignite data to start from scratch, and start the service:
[root@b2bivmign1,2,3 apache-ignite]# rm -rf db/ diagnostic/ snapshots/ [root@b2bivmign1,2,3 apache-ignite]# systemctl start apache-ignite@b2bi-config.xml.service
I open visor, check the topology that all nodes have joined, then activate the cluster.
visor> top -activate
visor> quit
- I connect with sqlline and create my tables:
- I go to one of the servers, stop the service and delete the data, then start the service again:
[root@b2bivmign2 apache-ignite]# systemctl stop apache-ignite@b2bi-config.xml.service [root@b2bivmign2 apache-ignite]# rm -rf db/ diagnostic/ snapshots/ [root@b2bivmign2 apache-ignite]# systemctl start apache-ignite@b2bi-config.xml.service
- Baseline looks like this:
- Connect with sqlline to that node, issue reproduces:
This was reproduced on:
openjdk version "11.0.18" 2023-01-17 LTS OpenJDK Runtime Environment Zulu11.62+17-CA (build 11.0.18+10-LTS) OpenJDK 64-Bit Server VM Zulu11.62+17-CA (build 11.0.18+10-LTS, mixed mode)
RPM: apache-ignite-2.14.0-1.noarch
Rocky Linux release 8.7 (Green Obsidian)