Compare strings by matching words

Question

I'd like to know if such an algorithm exists and is implemented in any database (ideally Postgres).

Levenstein matches strings, but I'd like to compare strings based on the number of matching words. For example, having:

the quick brown fox jumps over the lazy dog

I should come with a result of 2 if i try to match it against

the pen lies over the table

because it matched "the" and "over" on both the sentences

David Aldridge · Accepted Answer · 2013-05-23T18:50:04.480

1

Here's a SQL method using arrays:

select count(*) from
(
(select distinct unnest(string_to_array(upper('the quick brown fox jumps over the lazy dog'),' ')))
intersect all
(select distinct unnest(string_to_array(upper('the pen lies over the table'),' ')))
) t3

http://sqlfiddle.com/#!12/724f7/6

edited May 23 '13 at 18:50

answered May 23 '13 at 18:16

David Aldridge

51,479
8
68
96

score 0 · Answer 2 · edited May 23 '17 at 11:43

I believe you can use the same algorithms designed for letters also for words. See also this question. Comparing by words is not common and I'm pretty sure that PostgreSQL doesn't support it (nor any other database I know). However as long as you can use arrays, as suggested in answer by David Aldridge, you can write your own stored procedure for this.

You can take inspiration for the algorithm on Wikibooks, just replacing Strings for List<String>, size() for length() and char comparison for equals() would work. Now you can implement the same in SQL, all you need is array allocation (and some index arithmetic if you cannot use two-dimensional arrays). In the worst case, you can use a temporary table instead of arrays.

Compare strings by matching words

2 Answers2