3

I have an Apache Spark Job and one of its components fires queries at Apache Ignite Data Grid using Ignite SQL and the query is a SQLFieldsQuery. I was going through the thread dump and in one of the Executor logs I saw the following :

org.h2.mvstore.db.TransactionStore.begin(TransactionStore.java:229)

org.h2.engine.Session.getTransaction(Session.java:1580)

org.h2.engine.Session.getStatementSavepoint(Session.java:1588)

org.h2.engine.Session.setSavepoint(Session.java:793)

org.h2.command.Command.executeUpdate(Command.java:252)

org.h2.jdbc.JdbcStatement.executeUpdateInternal(JdbcStatement.java:130)

org.h2.jdbc.JdbcStatement.executeUpdate(JdbcStatement.java:115)

org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.connectionForThread(IgniteH2Indexing.java:428)

org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.connectionForSpace(IgniteH2Indexing.java:360)

org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryLocalSqlFields(IgniteH2Indexing.java:770)

org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:892)

org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:886)

org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)

org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:1666)

org.apache.ignite.internal.processors.query.GridQueryProcessor.queryLocalFields(GridQueryProcessor.java:886)

org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:698)

com.test.ignite.cache.CacheWrapper.queryFields(CacheWrapper.java:1019)

The last line in my code executes a sql fields query as follows :

SqlFieldsQuery sql = new SqlFieldsQuery(queryString).setArgs(args);
cache.query(sql);

According to my understanding, Ignite has its own data grid which it uses to store the cache data and indices. It only makes use of H2 database to parse the SQL query and get a query execution plan.

But, the Thread dump shows that updates are being executed and transactions are involved. I don't understand the need for transactions or updates in a SQL Select Query.

I want to know the following about the role of H2 database in Ignite :

  • I went into the open source code of Apache Ignite(version 1.7.0) and saw that it was trying to open a connection to a specific schema in H2 database by executing the query SET SCHEMA schema_name ( connectionForThread() method of IgniteH2Indexing class ). Is one schema or one table created for every cache ? If yes, what information does it contain since all the data is stored in ignite's data grid.

  • I also came across another interesting thing in the open source code which is that Ignite tries to derive the schema name in H2 from space name ( reference can be found in queryLocalSqlFields() method of IgniteH2Indexing class ). I want to know what does this space name indicate and is it something internal to Ignite or configurable ?

  • Would the setting of schema and connection to H2 db happen for each of my SQL query, if yes then is there any way to avoid this ?

Aniketh Jain
  • 603
  • 7
  • 25

1 Answers1

1
  1. Yes, we call executeUpdate to set schema. In Ignite 2.x we will be able to switch to Connection.setSchema for that. Right now we create SQL schema for each cache and you can create multiple tables in it, but this is going to be changed in the future. It does not actually contain anything, we just utilize some H2 APIs.
  2. Space name is basically the same thing as a cache name. You can configure SQL schema name for a cache using CacheConfiguration.setSqlSchema.
  3. If you run queries using the same cache instance, schema will not change.
Sergi Vladykin
  • 344
  • 1
  • 10
  • Thanks @Sergi for the quick reply. So, if I use CacheConfiguration.setSqlSchema and provide a schema name of my own for all my caches then, the query lookup performance should improve significantly, since it does not have to set schema for every query and can reuse only one connection to H2 database ? – Aniketh Jain May 09 '17 at 05:18
  • As you also mentioned a SQL schema is created per cache which can contain multiple tables in it. Why are multiple tables created per cache, what is the use of it ? . I thought it was one schema per Ignite Grid and it contained multiple tables, where each table represented an Ignite Cache. – Aniketh Jain May 09 '17 at 05:22
  • @AnikethJain As Sergi mentioned, currently you have the ability to create multiple tables in a single cache, but it is not recommended. Starting with 2.1 you will only be able to create one table per cache. – Dmitriy May 09 '17 at 15:55
  • @AnikethJain As far as the schema goes, in Ignite 2.0, cache name is also the schema name, which is not ideal, but for now is the only option. Starting with Ignite 2.1 (coming in June) you will be able to set and control the schema. – Dmitriy May 09 '17 at 15:58
  • @Sergi I think SET Schema is causing a unnecessary overhead while querying, Here is how I concluded it, I created 4 similar caches( same value class) and executed same select query on 4 caches serially in a single thread which took around 20% more time compared to executing 4 different queries on same cache serially( set schema will not be called here). – Benak Raj May 10 '17 at 05:09
  • @Dmitriy My concern over creation of different schema for different caches was due to the frequent use of SET SCHEMA while querying different caches in a grid. As Benak Raj mentioned it is leading to a significant overhead in query execution. – Aniketh Jain May 10 '17 at 08:01
  • @AnikethJain I guess you did not do any warm up before measuring and this 20% slowdown is unrelated here. If I'm wrong please provide a reproducer, we will investigate. – Sergi Vladykin May 11 '17 at 08:44