I am executing a query like
select <column> from <mytable> where <partition_key> = <value> limit 10
and it is taking FOREVER to execute. I looked at the physical plan and I see a HiveTableScan
in there and that looked fishy, does that mean the query is scanning the entire table? I was expecting the query to
A. exactly scan 1 partition and no more
B. end the scan as soon as it returns 10 rows
Is my understanding incorrect? How do I make spark perform exactly this?