I have a table of files paths with hashes of its content, there could be multiple files with the same hash.
create table files(
path varchar(256) not null,
hash varchar(100) not null
);
create index files_hash on files (hash);
Given I have array of 3 hashes 'a', 'b', 'c'
, how can I find efficiently which of those hashes the files
table contains?
I can use select distinct hash
to get hashes that exist in files
:
select distinct hash
from files
where hash in ('a', 'b', 'c')
But would it be efficient? Like say there are hundreds of thousands of files with hash 'a'
, would PostgreSQL iterate over all those records? Is there a way to tell it to stop immediately as soon as it finds the first one?