SQL Server index for large nvarchar column failing

Question

I have an nvarchar column of size 2000. Sometimes when I am inserting, it's failing with the error:

System.Data.SqlClient.SqlException: Operation failed. The index entry of length 1146 bytes for the index 'NonClusteredIndex-20161206-202443' exceeds the maximum length of 900 bytes.

But I don't know what to do. This table has 45m records so it has to have an index.

Any ideas?

Possible duplicate of [SQL server - worth indexing large string keys?](http://stackoverflow.com/questions/8001905/sql-server-worth-indexing-large-string-keys) — Andrey Korneyev, May 10 '17 at 12:34
What type of column is that? And what string do you have that is so long and you need to index it? What are you doing with that column? Are you using it for a search function with `LIKE '%abc'` ? If you intend to search withing the string of that column may I suggest using a FULL-TEXT index? — Radu Gheorghiu, May 10 '17 at 12:38
It's a translation column , and I am using it with equal operator , select * from table where column = 'text' — Mazen Abu Taweelih, May 10 '17 at 12:44
Then *don't* do so. This is a very bad design. Translation tables typically contain a phrase ID and translations in various languages. Why use *one* of those translations as a key? — Panagiotis Kanavos, May 10 '17 at 12:45
@MazenAbuTaweelih using a key that has business meaning is a bad idea. Using a *phrase* as a key, that can change for the slightest reason (eg spell-checking) is a very bad idea. — Panagiotis Kanavos, May 10 '17 at 12:47
@PanagiotisKanavos: It depends on how are designed those tables. [1] `dbo.OriginalText(ID PK, OriginalText) dbo.Language(ID PK, Name UQ) dbo.Translation(ID PK, IDOriginalText FK, IDLanguage FK, TranslatedText)` could be a solution but (for example) [2] `dbo.Translation(ID PK, OriginalText FK, IDLanguage FK, TranslatedText) dbo.Language(ID PK, Name UQ)` not. — Bogdan Sahlean, May 10 '17 at 15:25

score 3 · Answer 1 · answered May 10 '17 at 12:40

3

Indexing such a large string column is usually a rather bad sign. I would first ask if you really want a full-text index. That is much more useful for text than just a regular index.

If you do need an index, then you can use a computed column to extract a certain amount of characters and build an index on that:

alter table t add col100 as (left(col, 100));

create index ind_t_col100 on t(col100);

Be sure to use col100 for any expressions that should use the index.

If your purpose for the index is to guarantee uniqueness, then you can calculate a checksum and the first 100 character and create a unique index on those values.

answered May 10 '17 at 12:40

Gordon Linoff

1,242,037
58
646
786

It's a translations table , The index used is to retrieve a translation if it already exists in the table so I don't translate it again. – Mazen Abu Taweelih May 10 '17 at 12:43
@MazenAbuTaweelih Then you can do a checksum / hash etc. of that entire text and check if that checksum / hash already exists, as Gordon already mentioned – Radu Gheorghiu May 10 '17 at 12:44
No need for a hash. Use a phrase ID/resource ID and translation in various languages. What it the (arbitrary) primary phase changes? What if it has a typo? – Panagiotis Kanavos May 10 '17 at 12:46
1

@PanagiotisKanavos What if he's translating entire texts and doesn't want to do that translation all over again? – Radu Gheorghiu May 10 '17 at 12:47
1

I am not trying to extract words translation , I am just translation the phrase as is , and there is another parameter to specify the language , the exact query is something like this , select * from table where text = 'text' and destinationlanguage = 'xx' – Mazen Abu Taweelih May 10 '17 at 12:48
Even worse - the "key/hash" is going to change a lot more often, becoming essentially useless. Where would the "primary" text be stored? How could anyone find any results if the primary and database text differ by a single dot? – Panagiotis Kanavos May 10 '17 at 12:49
@MazenAbuTaweelih and the way such tables are designed, is that a ResourceID and language are used to identify the text used in each translation – Panagiotis Kanavos May 10 '17 at 12:50
@PanagiotisKanavos I agree, but I don't think that's the database's problem. It's something for the application to handle. – Radu Gheorghiu May 10 '17 at 12:50
@PanagiotisKanavos Yes , there is a resource id and language used to identify , It's working totally fine . My whole objective is just to make the query return records faster and remove that error that's happening , Currently It's taking arround 2 seconds but I feel it can be faster. – Mazen Abu Taweelih May 10 '17 at 12:51
Dears,Just for your information , I found that emoji's checksum is zero , this means that the checksum calculation might be wrong for some texts if they include any emoji – Mazen Abu Taweelih May 10 '17 at 13:04
@MazenAbuTaweelih: Could you specify what datatype are you using ? Also, could you add that text having emoji are you using ? For example: `SELECT CHECKSUM(HASHBYTES('sha1', N''))` returns `917799192`. – Bogdan Sahlean May 10 '17 at 13:24
@PanagiotisKanavos: I belive this is a false problem. I don't see any mention regarding any `UPDATE` statement executed to changed the original text. – Bogdan Sahlean May 10 '17 at 13:29
@BogdanSahlean that's an unfounded belief then. It assumes that no typos are ever going to be discovered. The chances of that are precisely 0 – Panagiotis Kanavos May 11 '17 at 07:00
@PanagiotisKanavos Updating original text will generate at some point some page splits. Assuming some updates will be executed on `OriginalText` column, because index key is defined on `INT`eger column (data type of `CHECKSUM` result is `INT`) the number of these page splits will be low. From my point of view, this is a tentative to transfer a DBA issue (index fragmentation) on dev side. – Bogdan Sahlean May 11 '17 at 08:37

SQL Server index for large nvarchar column failing

1 Answers1