-2

I am using sql server and have a table with 2 columns
myId varchar(80)
cchunk varchar(max)

Basicly it stores large chunk of text so thats why I need it varchar(max). My problem is when I do a query like this:

select *
from tbchunks
where
  (CHARINDEX('mystring1',tbchunks.cchunk)< CHARINDEX('mystring2',tbchunks.cchunk))
AND
  CHARINDEX('mystring2',tbchunks.cchunk) - CHARINDEX('mystring1',tbchunks.cchunk) <=10

It takes about 3 seconds to complete, and the table chunks is about only 500,000 records and data returned from the above query is anywhere between 0 to 800 max

I have unclustered index on myid column, it helped with making fast select count(*) but didnt help with the above query.

I tried using Fulltext but was slow. i tried spliting the text in cchunk into smaller parts and adding an id column that will connect all those splited chunks, but ended up with a table with 2 million records of splited chunks of text (i did that so i can add index) but still the query was even slower.

EDIT: modified the table to include primary key (int) created fultext catalog with "Accent Senstive=true" created fulltext index on my tabe on column "cchunk" ran the same above query and it ended up taking 22 seconds with is much slower

UPDATE Thanks everyone for suggesting the FullText (@Aaron Bertrand thanks!), i converted my query to this

SELECT * FROM tbchunksAS FT_TBL INNER JOIN CONTAINSTABLE(tbchunks, cchunk, '(mystring1 NEAR mystring2)') AS KEY_TBL ON FT_TBL.cID = KEY_TBL.[KEY]

by the way the cID is the primary key i added later. anyway i am getting borad results and i notice that the higher the RANK column that was returned the better the results. my question is when RANK starts to get accurate?

user2280232
  • 155
  • 2
  • 8
  • 4
    Well, have you considered [Full-Text Search](http://msdn.microsoft.com/en-us/library/ms142571.aspx) and configuring it correctly? Saying you "tried it and it was slow" does not give a lot of confidence that you created the catalog correctly, populated one or more indexes correctly, ran the right query against the data to yield accurate and efficient proximity search results, etc. How long did you spend with full-text search exactly? – Aaron Bertrand Apr 14 '13 at 18:35
  • not so long, i just set up the catalog and tested it and was slow, so i forgot about it. anyway will i be able to use charindex? – user2280232 Apr 14 '13 at 18:38
  • 4
    I drove a Porsche last week and it was slower than my wife's Nissan. Oh I forgot to mention there was a beer bottle lodged under the gas pedal! It's tough to tell you why Full-Text Search was slower if we have no idea how you configured it and what queries you ran against it. – Aaron Bertrand Apr 14 '13 at 18:40
  • Also is `'mystring1'/'mystring2'` always a hard-coded part of this query, or is it actually a variable? – Aaron Bertrand Apr 14 '13 at 18:40
  • 2
    No, you don't use `CHARINDEX` with full-text search. See? You've dismissed the technology that is designed specifically for your use case but didn't test it appropriately. – Aaron Bertrand Apr 14 '13 at 18:41
  • Fulltext is the answer, please show us your fulltext settings and query. You cant find best performance with charindex. – highwingers Apr 14 '13 at 18:42
  • "mystring1/mystring2" are both variables...let me setup fulltext search quickly and will post my query – user2280232 Apr 14 '13 at 18:45
  • ok i posted an edit its slower :(((( – user2280232 Apr 14 '13 at 18:52
  • Of course the same query is going to run slow. CHARINDEX isn't going to use your full-text index. Could you please *read* the link I posted in my initial comment? – Aaron Bertrand Apr 14 '13 at 18:53
  • yea i am reading it, but am seeing that "is" is a stop word. but i need to search for the word "is" which can be mystring1 also i need the location of mystring1 and mystring2 so that i can compare them. it dont only need to know they are there, but rather where are they – user2280232 Apr 14 '13 at 18:57
  • Did you try with simply Contains keyword? – highwingers Apr 14 '13 at 21:11

2 Answers2

2

An index isn't going to help with CHARINDEX at all. An index on a particular column is only going to be able to quickly find rows where the value in the indexed field is exactly an indexed value. I'm actually quite surprised that query only takes 3 seconds given that it has to read every single row four times (or at the very least, twice).

Colselaw
  • 1,069
  • 9
  • 21
  • yea so what do you suggest then? – user2280232 Apr 14 '13 at 19:10
  • An index can, to the extent that it reduces I/O, increase the performance of some queries. If the table in question had larger rows then all of the `cchunk` values could be read faster using an index instead of the data pages. Not likely to help much in the OP's specific case. A [covering index](http://stackoverflow.com/questions/609343/what-are-covering-indexes-and-covered-queries-in-sql-server) can do surprising things in the right environment. – HABO Apr 14 '13 at 19:26
1

Well as good the ideas that were presented here, no body manage to really solve my problem but rather provided helpful tips that lead me to the solution which i would love to share.

Using Full text really was the answer like many mentioned but i managed to use the Contains in combination with Near so it can totally replace my current sql query and provide an awesome speed.

CONTAINS(tbchunks, 'NEAR ((mystring1, mystring2), 3, TRUE)')

user2280232
  • 155
  • 2
  • 8
  • The equivalent of this for CHARINDEX is CONTAINS(tbchunks, 'NEAR ((mystring1, mystring2), 0, TRUE)') - which matches "mystring1 mystring2" with exactly one space between – snez May 23 '13 at 14:36