1

Given a table with 2 cols, id (bigserial) and tags (varchar(64)[])

id|tags
--------
1 |a,b,c
2 |a,c
3 |d,e

what would an optimal query be that would fetch rows based on tag intersection count (ic)?

So searching for a,b,c would return

id|ic
-----
1 |3
2 |2
3 |0
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Sunder
  • 1,445
  • 2
  • 12
  • 22
  • I don't understand how the intersection count is defined. Can you explain it in more formal terms? – Kevin Dec 17 '14 at 01:11

2 Answers2

1
SELECT t.id, count(tag = ANY ('{a,b,c}') OR NULL) AS ic
FROM   tbl t
     , unnest(tags) x(tag)
GROUP  BY 1;

That's an implicit JOIN LATERAL. Rows with empty or NULL arrays are excluded. To include those, use:

LEFT JOIN unnest(tags) x(tag) ON TRUE

OTOH, if you are only interested in rows with at least one match you could pre-select with the overlap array operator && for better performance:

SELECT t.id, count(tag = ANY ('{a,b,c}') OR NULL) AS ic
FROM  (SELECT * tbl WHERE tags && '{a,b,c}') t
     , unnest(tags) x(tag)
GROUP  BY 1;

&& can be supported with a GIN index, which would make the difference for a big table.

More about the counting technique:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • You're amazing Erwin! Thank you! One more question if you don't mind -- what's x(tag) doing? And why can't I reference ic anywhere else in the query, say for filtering purposes with a "where ic > 0" – Sunder Dec 17 '14 at 01:55
  • 1
    @rebnoob: `unnest(tags) x(tag)` is short for `unnest(tags) AS x(tag)`, where `x` is a table alias and `tag` is a column alias. You could also write `unnest(tags) tag`, then `tag` would serve as both table *and* column alias. As for why you can't reference `ic` in the `WHERE` clause: http://stackoverflow.com/questions/14074546/how-to-re-use-result-for-select-where-and-order-by-clauses/14075977#14075977 or http://stackoverflow.com/questions/19848930/group-by-case-statement/19849537#19849537 – Erwin Brandstetter Dec 17 '14 at 02:04
0
SELECT id, IF(('a' = ANY(tags)),1,0) + IF(('b' = ANY(tags)),1,0) + IF(('c' = ANY(tags)),1,0) AS ic
FROM mytable
striving_coder
  • 798
  • 1
  • 5
  • 7