1

If I write a MySQL query that searches or sorts based on a field that has no index, does the db engine have to do a full table scan?

What if I do this in Presto or Cassandra?

MedicineMan
  • 15,008
  • 32
  • 101
  • 146
  • Trino (formerly Presto SQL) can sort large amounts of data in distributed (thus: scalable) manner. The data still needs to be fully read (full table scan) for that. – Piotr Findeisen Feb 21 '21 at 20:13

2 Answers2

0

Yes, and you can confirm this with MySQL's EXPLAIN statement, which shows how the optimizer plans to access the table.

Here's an example where I search on an indexed column, so it tells me which key (that is, index) it will use.

mysql> explain select * from mytable where idx_column=123\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: mytable
   partitions: NULL
         type: ref
possible_keys: idx_column
          key: idx_column
      key_len: 5
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL

Here's an example where it shows type: ALL which means it must examine all rows in the table. I.e. a table-scan.

mysql> explain select * from mytable where non_idx_column=123\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: mytable
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where

There's a lot more to learn about EXPLAIN, so I recommend reading about it in the manual and in blogs.

Presto has its own version of EXPLAIN (https://prestodb.io/docs/current/sql/explain.html) and the output has totally different information.

Cassandra doesn't have an EXPLAIN statement, but you can use request tracing to give you some different kind of information about the query execution. Doing a table-scan can be a risk for performance. See Issue in full table scan in cassandra

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0

Presto meaning prestodb.io doesn't have indexes. But if the predicate can be pushed down, it will be depending on the connector that is reading the underlying data.

The Hive connector (used for reading HDFS or S3) for example is fairly mature and can push down predicates to reduce the amount of IO.

Here's a blog that explain some of the behavior: https://prestodb.io/blog/2019/12/23/improve-presto-planner

  • Dipti
Dipti
  • 56
  • 1