I am making my own crawler. Now my question is about indexes.
I have 2 columns that is about indexes.
One is pageurl
and the other is hashcode
colum.
pageurl
column is VARCHAR
and hashcode
column is 64 bit int
.
This is the main query I am executing:
SELECT PageId FROM tblPages WHERE HashCode=biginthashcode AND PageUrl='pageurl'
PageId
is identity primary key.
Now I was using this as an index:
CREATE nonclustered INDEX indexHashCode ON tblpages (hashcode)
INCLUDE (pageurl,pageid)
But this index above causes too many duplicate rows because of the multi-threaded nature of the software. Probably caused by the SQL delays.
So I have to make it either like below
CREATE UNIQUE nonclustered INDEX indexHashCode ON tblpages (hashcode,pageurl)
INCLUDE (pageid)
Or somehow make it to do not add duplicate values. Which are duplicate values?
Duplicate value means both hashcode and pageurl is same. Is that possible without creating unique index like above with my first index?
I'm using Microsoft SQL Server 2008.