I am using sql server and have a table with 2 columns
myId varchar(80)
cchunk varchar(max)
Basicly it stores large chunk of text so thats why I need it varchar(max)
.
My problem is when I do a query like this:
select *
from tbchunks
where
(CHARINDEX('mystring1',tbchunks.cchunk)< CHARINDEX('mystring2',tbchunks.cchunk))
AND
CHARINDEX('mystring2',tbchunks.cchunk) - CHARINDEX('mystring1',tbchunks.cchunk) <=10
It takes about 3 seconds to complete, and the table chunks is about only 500,000 records and data returned from the above query is anywhere between 0 to 800 max
I have unclustered index on myid column, it helped with making fast select count(*) but didnt help with the above query.
I tried using Fulltext but was slow. i tried spliting the text in cchunk into smaller parts and adding an id column that will connect all those splited chunks, but ended up with a table with 2 million records of splited chunks of text (i did that so i can add index) but still the query was even slower.
EDIT: modified the table to include primary key (int) created fultext catalog with "Accent Senstive=true" created fulltext index on my tabe on column "cchunk" ran the same above query and it ended up taking 22 seconds with is much slower
UPDATE Thanks everyone for suggesting the FullText (@Aaron Bertrand thanks!), i converted my query to this
SELECT * FROM tbchunksAS FT_TBL INNER JOIN CONTAINSTABLE(tbchunks, cchunk, '(mystring1 NEAR mystring2)') AS KEY_TBL ON FT_TBL.cID = KEY_TBL.[KEY]
by the way the cID is the primary key i added later. anyway i am getting borad results and i notice that the higher the RANK column that was returned the better the results. my question is when RANK starts to get accurate?