2

I have set up a 3 node Apache Ignite cluster and noticed the following unexpected behavior:

(Tested with Ignite 2.10 and 2.13, Azul Java 11.0.13 on RHEL 8)

We have a relational table "RELATIONAL_META". It's created by our software vendors product that uses Ignite to exchange configuration data. This table is backed by this cache, that gets replicated to all nodes:

[cacheName=SQL_PUBLIC_RELATIONAL_META, cacheId=-252123144, grpName=null, grpId=-252123144, prim=512, mapped=512, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, affCls=RendezvousAffinityFunction]

Seen behavior:

I did a failure test, simulating a disk failure of one of the Ignite nodes. The "failed" node restarts with an empty disk and joins the topology as expected. While the node is not yet part of the baseline nodes, either because auto-adjust is disabled, or auto-adjust did not yet complete, the restarted node returns empty results via the JDBC connection:

0: jdbc:ignite:thin://b2bivmign2/> select * from RELATIONAL_META;
+------------+--------------+------+-------+---------+
| CLUSTER_ID | CLUSTER_TYPE | NAME | VALUE | DETAILS |
+------------+--------------+------+-------+---------+
+------------+--------------+------+-------+---------+
No rows selected (0.018 seconds)

It's interesting that it knows the structure of the table, but not the contained data.

The table actually contains data, as I can see when I query against one of the other cluster nodes:

0: jdbc:ignite:thin://b2bivmign1/> select * from RELATIONAL_META;
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
|       CLUSTER_ID        | CLUSTER_TYPE |         NAME         | VALUE |                                                     DETAILS                                                      |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
| cluster_configuration_1 | writer       | change index         | 1653  | 2023-01-24 10:25:27                                                                                              |
| cluster_configuration_1 | writer       | last run changes     | 0     | Updated at 2023-01-29 11:08:48.                                                                                  |
| cluster_configuration_1 | writer       | require full sync    | false | Flag set to false on 2022-06-11 09:46:45                                                                         |
| cluster_configuration_1 | writer       | schema version       | 1.4   | Updated at 2022-06-11 09:46:25. Previous version was 1.3                                                         |
| cluster_processing_1    | reader       | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:50] |
| cluster_processing_1    | reader       | change index         | 1653  | 2023-01-29 10:20:39                                                                                              |
| cluster_processing_1    | reader       | conflicts            | 0     | Reset due to full sync at 2022-06-11 09:50:12                                                                    |
| cluster_processing_1    | reader       | require full sync    | false | Cleared the flag after full reader sync at 2022-06-11 09:50:12                                                   |
| cluster_processing_2    | reader       | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:43] |
| cluster_processing_2    | reader       | change index         | 1653  | 2023-01-29 10:24:06                                                                                              |
| cluster_processing_2    | reader       | conflicts            | 0     | Reset due to full sync at 2022-06-11 09:52:19                                                                    |
| cluster_processing_2    | reader       | require full sync    | false | Cleared the flag after full reader sync at 2022-06-11 09:52:19                                                   |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
12 rows selected (0.043 seconds)

Expected behavior:

While a node is not part of the baseline, it is per definition not persisting data. So when I run a query against it, I would expect it to fetch the partitions that it does not hold itself, from the other nodes of the cluster. Instead it just shows an empty result, even showing the correct structure of the table, just without any rows. This has caused inconsistent behavior in the product we're actually running, that uses Ignite as a configuration store, because suddenly the nodes see different results depending on which node they have opened their JDBC connection to. We are using a JDBC connection string that contains all the Ignite server nodes, so it fails over when one goes down, but of course it does not prevent the issue I have described here.

Is this "works a designed"? Is there any way to prevent such issues? It seems to be problematic to use Apache Ignite as a configuration store for an application with many nodes, when it behaves like this.

Regards, Sven

Update:

After restarting one of the nodes with an empty disk, it joins as a node with a new ID. I think that is expected behavior. We have enabled baseline auto-adjust, so the new node id should join the baseline, and old one should leave the baseline. This works, but before this is completed, the node returns empty results to SQL queries.

Cluster state: active
Current topology version: 95
Baseline auto adjustment enabled: softTimeout=60000
Baseline auto-adjust is in progress

Current topology version: 95 (Coordinator: ConsistentId=cdf43fef-deb8-4732-907f-6264bd55de6f, Address=b2bivmign3.fritz.box/192.168.0.151, Order=11)

Baseline nodes:
    ConsistentId=3ffe3798-9a63-4dc7-b7df-502ad9efc76c, Address=b2bivmign1.fritz.box/192.168.0.149, State=ONLINE, Order=64
    ConsistentId=40a8ae8c-5f21-4f47-8f67-2b68f396dbb9, State=OFFLINE
    ConsistentId=cdf43fef-deb8-4732-907f-6264bd55de6f, Address=b2bivmign3.fritz.box/192.168.0.151, State=ONLINE, Order=11
--------------------------------------------------------------------------------
Number of baseline nodes: 3

Other nodes:
    ConsistentId=080fc170-1f74-44e5-8ac2-62b94e3258d9, Order=95
Number of other nodes: 1

Update 2:

This is the JDDB URL the application uses:

#distributed.jdbc.url - run configure to modify this property
distributed.jdbc.url=jdbc:ignite:thin://b2bivmign1.fritz.box:10800..10820,b2bivmign2.fritz.box:10800..10820,b2bivmign3.fritz.box:10800..10820

#distributed.jdbc.driver - run configure to modify this property
distributed.jdbc.driver=org.apache.ignite.IgniteJdbcThinDriver

We have seen it connecting via JDBC to a node that was not part of the baseline and therefore receiving empty results. I wonder why a node that is not part of the baseline returns any results without fetching the data from the baseline nodes?

Update 3:

It seems to be dependent on the tables/caches attributes wether this happens, I cannot yet reproduce it with a table I create on my own, only with the table that is created by the product we use.

This is the cache of the table that I can reproduce the issue with:

[cacheName=SQL_PUBLIC_RELATIONAL_META, cacheId=-252123144, grpName=null, grpId=-252123144, prim=512, mapped=512, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, affCls=RendezvousAffinityFunction]

I have created 2 tables my own for testing:

CREATE TABLE Test (
  Key CHAR(10),
  Value CHAR(10),
  PRIMARY KEY (Key)
) WITH "BACKUPS=2";


CREATE TABLE Test2 (
  Key CHAR(10),
  Value CHAR(10),
  PRIMARY KEY (Key)
) WITH "BACKUPS=2,atomicity=ATOMIC";

I then shut down one of the Ignite nodes, in this case b2bivmign3, and remove the ignite data folders, then start it again. It starts as a new node that is not part of the baseline, and I disabled auto-adjust to just keep that situation. I then connect to b2bivmign3 with the SQL CLI and query the tables:

0: jdbc:ignite:thin://b2bivmign3/> select * from Test;
+------+-------+
| KEY  | VALUE |
+------+-------+
| Sven | Demo  |
+------+-------+
1 row selected (0.202 seconds)
0: jdbc:ignite:thin://b2bivmign3/> select * from Test2;
+------+-------+
| KEY  | VALUE |
+------+-------+
| Sven | Demo  |
+------+-------+
1 row selected (0.029 seconds)
0: jdbc:ignite:thin://b2bivmign3/> select * from RELATIONAL_META;
+------------+--------------+------+-------+---------+
| CLUSTER_ID | CLUSTER_TYPE | NAME | VALUE | DETAILS |
+------------+--------------+------+-------+---------+
+------------+--------------+------+-------+---------+
No rows selected (0.043 seconds)

The same when I connect to one of the other Ignite nodes:

0: jdbc:ignite:thin://b2bivmign2/> select * from Test;
+------+-------+
| KEY  | VALUE |
+------+-------+
| Sven | Demo  |
+------+-------+
1 row selected (0.074 seconds)
0: jdbc:ignite:thin://b2bivmign2/> select * from Test2;
+------+-------+
| KEY  | VALUE |
+------+-------+
| Sven | Demo  |
+------+-------+
1 row selected (0.023 seconds)
0: jdbc:ignite:thin://b2bivmign2/> select * from RELATIONAL_META;
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
|       CLUSTER_ID        | CLUSTER_TYPE |         NAME         | VALUE |                                                     DETAILS                                                      |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
| cluster_configuration_1 | writer       | change index         | 1653  | 2023-01-24 10:25:27                                                                                              |
| cluster_configuration_1 | writer       | last run changes     | 0     | Updated at 2023-01-29 11:08:48.                                                                                  |
| cluster_configuration_1 | writer       | require full sync    | false | Flag set to false on 2022-06-11 09:46:45                                                                         |
| cluster_configuration_1 | writer       | schema version       | 1.4   | Updated at 2022-06-11 09:46:25. Previous version was 1.3                                                         |
| cluster_processing_1    | reader       | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:50] |
| cluster_processing_1    | reader       | change index         | 1653  | 2023-01-29 10:20:39                                                                                              |
| cluster_processing_1    | reader       | conflicts            | 0     | Reset due to full sync at 2022-06-11 09:50:12                                                                    |
| cluster_processing_1    | reader       | require full sync    | false | Cleared the flag after full reader sync at 2022-06-11 09:50:12                                                   |
| cluster_processing_2    | reader       | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:43] |
| cluster_processing_2    | reader       | change index         | 1653  | 2023-01-29 10:24:06                                                                                              |
| cluster_processing_2    | reader       | conflicts            | 0     | Reset due to full sync at 2022-06-11 09:52:19                                                                    |
| cluster_processing_2    | reader       | require full sync    | false | Cleared the flag after full reader sync at 2022-06-11 09:52:19                                                   |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
12 rows selected (0.032 seconds)

I will test more tomorrow the find out which attribute of the table/cache enables this issue.

Update 4:

I can reproduce this with a table that is set to mode=REPLICATED instead of PARTITIONED.

CREATE TABLE Test (
  Key CHAR(10),
  Value CHAR(10),
  PRIMARY KEY (Key)
) WITH "BACKUPS=2";

[cacheName=SQL_PUBLIC_TEST, cacheId=-2066189417, grpName=null, grpId=-2066189417, prim=1024, mapped=1024, mode=PARTITIONED, atomicity=ATOMIC, backups=2, affCls=RendezvousAffinityFunction]

CREATE TABLE Test2 (
  Key CHAR(10),
  Value CHAR(10),
  PRIMARY KEY (Key)
) WITH "BACKUPS=2,TEMPLATE=REPLICATED";

[cacheName=SQL_PUBLIC_TEST2, cacheId=372637563, grpName=null, grpId=372637563, prim=512, mapped=512, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, affCls=RendezvousAffinityFunction]

0: jdbc:ignite:thin://b2bivmign2/> select * from TEST;
+------+-------+
| KEY  | VALUE |
+------+-------+
| Sven | Demo  |
+------+-------+
1 row selected (0.06 seconds)
0: jdbc:ignite:thin://b2bivmign2/> select * from TEST2;
+-----+-------+
| KEY | VALUE |
+-----+-------+
+-----+-------+
No rows selected (0.014 seconds)

Testing with Visor:

It makes no difference where I run Visor, same results.

We see both caches for the tables have 1 entry: +-----------------------------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+-----------+ | SQL_PUBLIC_TEST(@c9) | PARTITIONED | 3 | 1 (0 / 1) | min: 0 (0 / 0) | min: 0 | min: 0 | min: 0 | min: 0 | | | | | | avg: 0.33 (0.00 / 0.33) | avg: 0.00 | avg: 0.00 | avg: 0.00 | avg: 0.00 | | | | | | max: 1 (0 / 1) | max: 0 | max: 0 | max: 0 | max: 0 | +-----------------------------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+-----------+ | SQL_PUBLIC_TEST2(@c10) | REPLICATED | 3 | 1 (0 / 1) | min: 0 (0 / 0) | min: 0 | min: 0 | min: 0 | min: 0 | | | | | | avg: 0.33 (0.00 / 0.33) | avg: 0.00 | avg: 0.00 | avg: 0.00 | avg: 0.00 | | | | | | max: 1 (0 / 1) | max: 0 | max: 0 | max: 0 | max: 0 | +-----------------------------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+-----------+

One is empty when I scan it, the other has one row as expected:

visor> cache -scan -c=@c9
Entries in  cache: SQL_PUBLIC_TEST
+================================================================================================================================================+
|    Key Class     | Key  |           Value Class           |                                       Value                                        |
+================================================================================================================================================+
| java.lang.String | Sven | o.a.i.i.binary.BinaryObjectImpl | SQL_PUBLIC_TEST_466f2363_47ed_4fba_be80_e33740804b97 [hash=-900301401, VALUE=Demo] |
+------------------------------------------------------------------------------------------------------------------------------------------------+
visor> cache -scan -c=@c10
Cache: SQL_PUBLIC_TEST2 is empty
visor>

Update 5:

I have reduced the configuration file to this:

https://pastebin.com/dL9Jja8Z

I did not manage to reproduce this with persistence turned off, as I don't manage to keep a node out the baseline then, it always joins immediately. So maybe this problem is only reproducible with persistence enabled.

  1. I go to each of the 3 nodes, remove the Ignite data to start from scratch, and start the service:

    [root@b2bivmign1,2,3 apache-ignite]# rm -rf db/ diagnostic/ snapshots/ [root@b2bivmign1,2,3 apache-ignite]# systemctl start apache-ignite@b2bi-config.xml.service

  2. I open visor, check the topology that all nodes have joined, then activate the cluster.

https://pastebin.com/v0ghckBZ

visor> top -activate
visor> quit
  1. I connect with sqlline and create my tables:

https://pastebin.com/Q7KbjN2a

  1. I go to one of the servers, stop the service and delete the data, then start the service again:

[root@b2bivmign2 apache-ignite]# systemctl stop apache-ignite@b2bi-config.xml.service [root@b2bivmign2 apache-ignite]# rm -rf db/ diagnostic/ snapshots/ [root@b2bivmign2 apache-ignite]# systemctl start apache-ignite@b2bi-config.xml.service

  1. Baseline looks like this:

https://pastebin.com/CeUGYLE7

  1. Connect with sqlline to that node, issue reproduces:

https://pastebin.com/z4TMKYQq

This was reproduced on:

openjdk version "11.0.18" 2023-01-17 LTS OpenJDK Runtime Environment Zulu11.62+17-CA (build 11.0.18+10-LTS) OpenJDK 64-Bit Server VM Zulu11.62+17-CA (build 11.0.18+10-LTS, mixed mode)

RPM: apache-ignite-2.14.0-1.noarch

Rocky Linux release 8.7 (Green Obsidian)

  • So, the scenario is - wipe clean the node, no manual changes to the baseline? – Alexandr Shapkin Jan 29 '23 at 12:01
  • Scenario is: Node shuts down, then starts again with a wiped disk. We use AWS ephemeral SSDs for the Ignite storage. So I think from Ignite point of view that node is a new node. I will add an update to the main article in a minute. – Sven Borkert Jan 29 '23 at 12:20
  • Thanks, and the node starts serving the data right after it's added to the baseline? I.e. after 60 secs? – Alexandr Shapkin Jan 29 '23 at 12:45
  • I mean, it's not waiting till the end of rebalance? – Alexandr Shapkin Jan 29 '23 at 12:45
  • As you see in my posting, I can connect to the new node via JDBC while it has not yet joined the baseline, execute a select on the table and will have an empty result. This is what happens with our (we did not develop it) application, it seems the JDBC driver randomly selects one node to connect to and if it selects the wrong node, it will see only empty results. – Sven Borkert Jan 29 '23 at 13:55
  • Weirdly enough, I can not reproduce that issue. – Alexandr Shapkin Jan 31 '23 at 15:23
  • I tested the following scenario: 1 baseline node, 1 non-baseline node, two replicated caches. Thin client & JDBC both worked as expected, connecting to the second (non-baseline) node and returning the same result as it was for the node 1. – Alexandr Shapkin Jan 31 '23 at 15:56
  • Blind guess: could it be that a client might be connecting to another server? – Alexandr Shapkin Jan 31 '23 at 16:16
  • 1
    Hi, thanks for your investigation. This is getting more interesting now. I'm sure I did not connect to the wrong server, we noticed this behavior on a production system and I reproduced it with local VMs easily. But now I created a table on my own, and with this table the issue does not reproduce. I see it only on the tables created by the product we use. So it must be caused by the specific configuration of the cache that backs that tables. I will add an update to the article and keep testing what attribute exactly makes the table behave like that. – Sven Borkert Jan 31 '23 at 18:46
  • It seems this happens only with tables backed by replicated caches, not with partitioned caches with backups, see "Update 4". I also tested and reproduced this with Ignite 2.13 again today, I will later test if I can reproduce it with 2.14 also. – Sven Borkert Feb 01 '23 at 14:05
  • Reproduced on Ignite 2.14 as well. – Sven Borkert Feb 01 '23 at 17:54
  • I also tested it with replicated caches, but no luck. Do you mind sharing a reproducer if you already have it? A github page or something? – Alexandr Shapkin Feb 02 '23 at 11:23
  • 1
    Hmm, that is really odd. I'm reproducing this manually, so nothing on GitHub. I guess the behaviour might be enabled by our Ignite configuration then. I will share the configuration file later, and I will go through the options to see if there is something that might be the cause. – Sven Borkert Feb 02 '23 at 11:54
  • This is the configuration used: https://pastebin.com/FJzA2Dah The 3 ignite systems have the exact same configuration file, except they each have their own jks keystore with their own key. Is the difference between your and my reproduction setup maybe the enabled persistence? I will do some tests with that and write an update, not sure if today. Thanks, Sven. – Sven Borkert Feb 02 '23 at 17:58

0 Answers0