Problem
The task has the potential to stall your database server for tables of non-trivial size, since it's an O(N²) problem as long as you cannot utilize an index for it.
In a sequential scan you have to check every possible combination of two rows, that's n * (n-1) / 2
combinations - Postgres will run n * n-1
tests since it's not easy to rule out reverse duplicate combinations. If you are satisfied with the first match, it gets cheaper - how much depends on data distribution. For many matches, Postgres will find a match for a row early and can skip testing the rest. For few matches, most of the checks have to be performed anyway.
Either way, performance deteriorates rapidly with the number of rows in the table. Test each query with EXPLAIN ANALYZE
and 10, 100, 1000 etc. rows in the table to see for yourself.
Solution
Create a trigram index on word
- preferably GIN.
CREATE INDEX tbl_word_trgm_gin_idx ON tbl USING gin (word gin_trgm_ops);
Details:
The queries in both answers so far wouldn't use the index even if you had it. Use a query that can actually work with this index:
To list all matches (according to the question body):
Use a LATERAL CROSS JOIN
:
SELECT t2.word_id, t2.word, t1.word_id, t1.word
FROM tbl t1
, LATERAL (
SELECT word_id, word
FROM tbl
WHERE word_id <> t1.word_id
AND word like format('%%%s%%', t1.word)
) t2;
To just get rows that have any match (according to your title):
Use an EXISTS
semi-join:
SELECT t1.word_id, t1.word
FROM tbl t1
WHERE EXISTS (
SELECT 1
FROM tbl
WHERE word_id <> t1.word_id
AND word like format('%%%s%%', t1.word)
);