2

Consider the following entries in my table:

  • red apple
  • yellow apple
  • apple green
  • red aple
  • appel yellow

Several people have populated this table using a non-consistend notation (the color before or after 'apple'), also entering some spelling errors. Now I want to query all entries with the word apple, regardless of color or spelling.

With FUZZY():

SELECT name FROM "NEO_123456789ABCDE"."MYTABLE1" WHERE contains(name, 'apple', FUZZY(0.5))  

I only get:

  • red apple
  • red aple

When adding wildcards:

SELECT name FROM "NEO_123456789ABCDE"."MYTABLE1" WHERE contains(name, '%apple%', FUZZY(0.5)) 

I only get all entries, where apple was spelled right:

  • red apple
  • yellow apple
  • apple green

Why I can't combine both operators LIKE and CONTAINS in one query?

I need to find:

  • entries, where apple is surrounded by other words (in my case colors)
  • all forms of apple (regardless of the spelling)
Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
Evgenij Reznik
  • 17,916
  • 39
  • 104
  • 181
  • Why not just change the "fuzzy level": `WHERE contains(name, 'apple', FUZZY(0.2))` seems to do what you want. –  Oct 30 '15 at 12:11
  • a fuzzy value of 0.2 is usually much too low to return meaningful results – Pascalius Oct 05 '16 at 09:48

3 Answers3

2
select name from(
         SELECT name FROM "NEO_123456789ABCDE"."MYTABLE1" WHERE contains(name, 'apple', FUZZY(0.2)) --Part I
         UNION ALL
         SELECT name FROM "NEO_123456789ABCDE"."MYTABLE1" WHERE contains(name, '%apple%') --Part II
)group by name

This query basically joins the search results from contains clause with fuzzy search and normal search. You can also replace the Part II of query with like instead.

2

The fuzzy algorithm matches against the complete content of the column in your example. So it compares "apple" with "red apple" and "appel yellow"

What you want is a matching against the tokens in your columns. You can achieve this by creating a fulltext index, which will tokenize the content in the columns into a fulltext index. contains() will automatically use the fulltext index.

drop table MYTABLE1;
create column table MYTABLE1 
(
  name nvarchar(100)
);

insert into MYTABLE1 (name) values ('red apple');
insert into MYTABLE1 (name) values ('yellow apple');
insert into MYTABLE1 (name) values ('apple green');
insert into MYTABLE1 (name) values ('red aple');
insert into MYTABLE1 (name) values ('appel yellow');


CREATE FULLTEXT INDEX i_MYTABLE1 ON MYTABLE1(name) FUZZY SEARCH INDEX ON SYNC;

SELECT name FROM "MYTABLE1" WHERE contains(name, 'apple', FUZZY(0.5)) 
Pascalius
  • 14,024
  • 4
  • 40
  • 38
0

Perhaps you should try the following standard sql query :

SELECT name FROM "NEO_123456789ABCDE"."MYTABLE1" WHERE name like '%apple%'

This query will select your apples. Concerning your fuzzy, why not using it with a subquery ? Something like :

    SELECT name FROM "NEO_123456789ABCDE"."MYTABLE1" 
    WHERE contains(name, 'apple', FUZZY(0.5))
    and name in (
    SELECT name FROM "NEO_123456789ABCDE"."MYTABLE1" WHERE name like '%apple%'
    )
RMS
  • 26
  • 2