If you only want to match whole words, this is going to be considerably faster with a bigger table because it can use an index on entry
(unlike your solution, where the predicate is not sargable):
Simple
Assuming white space as word separator (\s+
).
SELECT *
FROM black_list
JOIN regexp_split_to_table('the task description', '\s+') entry USING (entry)
Advanced
For more specific needs, you can tailor the words you extract:
Split words (as defined by the POSIX standard) at \m
. The manual:
matches only at the beginning of a word
Remove leading, trailing or all characters you want to ignore. Since we split at the beginning of words, trailing characters may be enough.
Finally cast to lower case - assuming lower case entries in black_list
:
Basic and fast
Split words and trim given trailing characters:
SELECT *
FROM black_list
JOIN (
SELECT rtrim(w, ' /\,.-') AS entry -- add more?
FROM regexp_split_to_table('Oh - my, a4b smurf-village in/out.', '\m') w
) w USING (entry);
Thorough
Split words, remove all non-word characters and convert to lower case:
SELECT *
FROM black_list
JOIN (
SELECT lower(regexp_replace(w, '\W+', '')) AS entry
FROM regexp_split_to_table('Oh - my, a4b smurf-village in/out.', '\m') w
) w USING (entry);
SQL Fiddle
Beyond
You can take it one step further and test for similarity to also catch typos. But you have to weigh alpha against beta error (incorrectly excluding legal words).
SELECT set_limit(0.9); -- high example value
SELECT *
FROM black_list b
JOIN (
SELECT lower(regexp_replace(w, '\W+', '')) AS entry
FROM regexp_split_to_table('Oh - my, a4b smurf-village in/out.', '\m') w
) w ON b.entry % w.entry
The %
operator requires the additional module pg_trgm. Details: