TiDB executing full table scan

Question

I am using TiDB version 6.5.1, I wanted to know the behavior of TiKV, let's say I have a table t which contains composite index example (a,b), when I am trying to execute following cases,

1.select sum(c) from t where a=123 and b='simple' group by b; range scan happing as expected.

2.select sum(c) from t where a>=123 and b='simple' group by b; here I am passing indexed columns in where clause, So why range scan is not happening and table full scan can causes performance issues when table size is big.

3.select sum(a) from t where a>=123 and b='simple' group by b; if I use indexed column in select range scan happening.

4.select sum(c) from t where a>=123 group by a; same behavior like 2.

I have a requirement like can pass total index or left prefix index with >=, <=, between, like operators to support ad-hoc query, So TiKV will support this without full table scan?

Please suggest table design if any changes required, here I am planning to use TiKV + TiSpark to cover entire HTAP usecase.

Thanks,

Ajay Babu Maguluri.

Daniël van Eeden · Answer 1 · 2023-04-13T15:40:44.447

TiDB like any database has an optimizer that based on limited data (statistics) and in limited time needs to find an acceptable execution plan.

The table scan might be cheaper than other plans. You can restrict the plans the optimizer can take with hints and see what the cost is for each plan.

sql> CREATE TABLE t1(id INT PRIMARY KEY, c1 VARCHAR(255), KEY(c1));
Query OK, 0 rows affected (0.1698 sec)

sql> INSERT INTO t1 VALUES (1,'test'),(2,'another test');
Query OK, 2 rows affected (0.0154 sec)

Records: 2  Duplicates: 0  Warnings: 0

sql> ANALYZE TABLE t1;
Query OK, 0 rows affected, 1 warning (0.0950 sec)
Note (code 1105): Analyze use auto adjusted sample rate 1.000000 for table test.t1

sql> EXPLAIN FORMAT=VERBOSE SELECT * FROM t1 WHERE c1='test';
+--------------------+---------+---------+-----------+------------------------+-----------------------------------------+
| id                 | estRows | estCost | task      | access object          | operator info                           |
+--------------------+---------+---------+-----------+------------------------+-----------------------------------------+
| IndexReader_6      | 1.00    | 21.17   | root      |                        | index:IndexRangeScan_5                  |
| └─IndexRangeScan_5 | 1.00    | 199.38  | cop[tikv] | table:t1, index:c1(c1) | range:["test","test"], keep order:false |
+--------------------+---------+---------+-----------+------------------------+-----------------------------------------+
2 rows in set, 1 warning (0.0149 sec)
Note (code 1105): [c1] remain after pruning paths for t1 given Prop{SortItems: [], TaskTp: rootTask}

sql> EXPLAIN FORMAT=VERBOSE SELECT * FROM t1 IGNORE INDEX(c1) WHERE c1='test';
+---------------------+---------+---------+-----------+---------------+------------------------+
| id                  | estRows | estCost | task      | access object | operator info          |
+---------------------+---------+---------+-----------+---------------+------------------------+
| TableReader_7       | 2.00    | 82.22   | root      |               | data:Selection_6       |
| └─Selection_6       | 2.00    | 997.11  | cop[tikv] |               | eq(test.t1.c1, "test") |
|   └─TableFullScan_5 | 4.00    | 797.51  | cop[tikv] | table:t1      | keep order:false       |
+---------------------+---------+---------+-----------+---------------+------------------------+
3 rows in set (0.0021 sec)

Here the optimizer uses a IndexRangeScan and then in the second query we exclude the index and then it takes a TableFullScan which is much more expensive as you can see in the estCost column.

For questions like this it might be useful to share the output of SHOW CREATE TABLE... for the tables involved. The data or some description of the data (e.g. how unique the 123 value is) would also be helpful.

From the images you posted it looks like you have an index called a_b_index on (a, b). This means that TiDB can't use the second column of the index if you don't have a equality (=) match on the first column. Switching the order might be good as for the queries here you always do a equality match on the b column and a range match on the a column. But I can't see the full range of queries that you do so other queries might perform worse after this.

Here is some good explanation about this: https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys

Thanks for reply. I have a requirement like, 1. for OLTP case can query "b" as equivalent and "a" as range. 2. for OLAP case only "a" as range. So index design should both of these workloads. — Ajay, Apr 14 '23 at 05:18
generally MySQL will support range on first column of the composite index. — Ajay, Apr 14 '23 at 07:01
You might want to use two indexes: `(b,a)` and `(a)` in that case. TiDB also supports range on the first column of the index, just like MySQL. But if you do range on the first column you can't do equality on the second column. — Daniël van Eeden, Apr 14 '23 at 09:07
Non indexed column in select clause, let's say select c form table t where a>=123 also going full scan? — Ajay, Apr 14 '23 at 09:35
Selecting non-indexed columns is fine. That might mean it can't do a index-only scan like it would with a covering index but it should still perform quite well. — Daniël van Eeden, Apr 14 '23 at 13:35
1. In ad-hoc query cases can pass any column in select clause so full table scan means then what is the use of index. 2. When non indexed column used in where along with first column of composite index also shows full scan, not using index anywhere. is this 2 cases behavior of TiDB only? Ex: MySQL can handle this. — Ajay, Apr 15 '23 at 04:14

TiDB executing full table scan

1 Answers1