I have a table that looks like (with an example of number of rows in each to get the kind of ration):
expectedreportsnodes (1 000 000 rows):
nodejoinkey | integer | not null
nodeid | text | not null
nodeconfigids | text[] |
nodeconfigids
array generally contains 1-50 values.
And a second table:
expectedreports
(10 000 rows):
pkid | integer | not null
nodejoinkey| integer | not null
...
I want to query for all expected reports for which their exists a entry in nodeexpectedreports
with a given nodeConfigId
.
I have potentially large amount of nodeConfigIds
(thousands).
What is most efficient way of doing that?
For now, I have:
select E.pkid, E.nodejoinkey from expectedreports E
inner join (
select NN.nodejoinkey, NN.nodeid, NN.nodeconfigids from (
select N.nodejoinkey, N.nodeid, unnest(N.nodeconfigids) as nodeconfigids
from expectedreportsnodes N
) as NN
where NN.nodeconfigids) IN( VALUES ('cf1'), ('cf2'), ..., ('cf1000'), ..., ('cfN') )
) as NNN on E.nodejoinkey = NNN.nodejoinkey;
This seems to give the expected results but takes ages to execute.
What can be done to improve the query?
Updates:
- the proposed answer with array overlap and indexes is vastly less efficient on my set-up. I'm not able to say why.
- the following version seems to be the quickiest (again, not the least idea why - perhaps because I generally have few values in values in nodeconfigids?):
_
select E.pkid, E.nodejoinkey from expectedreports E
inner join (
select NN.nodejoinkey, NN.nodeconfigids
from (
select N.nodejoinkey, N.nodeconfigids,
generate_subscripts(N.nodeconfigids,1) as v
from expectedreportsnodes N
) as NN
where NN.nodeconfigids[v] in(values ('cf1'), ('cf2'), ..., ('cf1000'), ..., ('cfN') )
) as NNN
on E.nodejoinkey = NNN.nodejoinkey