If I write a MySQL query that searches or sorts based on a field that has no index, does the db engine have to do a full table scan?
What if I do this in Presto or Cassandra?
If I write a MySQL query that searches or sorts based on a field that has no index, does the db engine have to do a full table scan?
What if I do this in Presto or Cassandra?
Yes, and you can confirm this with MySQL's EXPLAIN statement, which shows how the optimizer plans to access the table.
Here's an example where I search on an indexed column, so it tells me which key (that is, index) it will use.
mysql> explain select * from mytable where idx_column=123\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: mytable
partitions: NULL
type: ref
possible_keys: idx_column
key: idx_column
key_len: 5
ref: const
rows: 1
filtered: 100.00
Extra: NULL
Here's an example where it shows type: ALL
which means it must examine all rows in the table. I.e. a table-scan.
mysql> explain select * from mytable where non_idx_column=123\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: mytable
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where
There's a lot more to learn about EXPLAIN, so I recommend reading about it in the manual and in blogs.
Presto has its own version of EXPLAIN (https://prestodb.io/docs/current/sql/explain.html) and the output has totally different information.
Cassandra doesn't have an EXPLAIN statement, but you can use request tracing to give you some different kind of information about the query execution. Doing a table-scan can be a risk for performance. See Issue in full table scan in cassandra
Presto meaning prestodb.io doesn't have indexes. But if the predicate can be pushed down, it will be depending on the connector that is reading the underlying data.
The Hive connector (used for reading HDFS or S3) for example is fairly mature and can push down predicates to reduce the amount of IO.
Here's a blog that explain some of the behavior: https://prestodb.io/blog/2019/12/23/improve-presto-planner