What types of values do you have for RelatedUser
/ UID
? Why, exactly, are you using NVARCHAR(100)
for it? NVARCHAR
is usually a horrible choice for a PK / FK field. Even if the value is a simple, alphanumeric code (e.g. ABTY1245
) there are better ways of handling this. One of the main problems with NVARCHAR
(and even with VARCHAR
for this particular issue) is that, unless you are using a binary collation (e.g. Latin1_General_100_BIN2
), every sort and comparison operation will apply the full range of linguistic rules, which can be well worth it when working with strings, but unnecessarily expensive when working with codes, especially when using the typically default case-insensitive collations.
Some "better" (but not ideal) solutions would be:
- If you really do need Unicode characters, at least specify a binary collation, such as
Latin1_General_100_BIN2
.
- If you do not need Unicode characters, then switch to using
VARCHAR
which will take up half the space and sort / compare faster. Also, still use a binary Collation.
Your best bet is to:
- Add an
INT IDENTITY
column to the User
table, named UseID
- Make
UserID
the Clustered PK
- Add an
INT
(no IDENTITY
) column to the Related
table, named UserID
- Add an FK from
Related
back to User
on UserID
- Remove the
RelatedUser
column from the Related
table.
- Add a non-clustered, Unique Index to the
User
table on the UserCode
column (this makes it an "alternate key")
- Drop and recreate the
UserIdInput
User-Defined Table Type to have an INT
datatype instead of NVARCHAR(100)
- If at all possible, alter the
ID
column of the User
table to have a binary collation (i.e. Latin1_General_100_BIN2
)
- If possible, rename the current
Id
column in the User
table to be UserCode
or something like that.
- If users are entering in the "Code" values (meaning: cannot guarantee they will always use all upper-case or all lower-case), then best to add an
AFTER INSERT, UPDATE
Trigger on the User
table to ensure that the values are always all upper-case (or all lower-case). This will also mean that you need to make sure that all incoming queries using the same all upper-case or all lower-case values when searching on the "Code". But that little bit of extra work will pay off.
The entire system will thank you, and show you its appreciation by being more efficient :-).
One other thing to consider: the TVP is a table-variable, and by default those only ever appear to the query optimizer to have a single row. So it makes some sense that adding a few thousand entries into the TVP would slow it down. One trick to help speed up TVP in this scenario is to add OPTION (RECOMPILE)
to the query. Recompiling queries with table variables will cause the query optimizer to see the true row count. If that doesn't help any, the other trick is to dump the TVP table variable into a local temporary table (i.e. #TempUserIDs
) as those do maintain statistics and optimize better when you have more than a small number of rows in them.
From O.P.'s comment on this answer:
[UID] is an ID used across our system (XXX-Y-ZZZZZZZZZZ...), XXX being letters, Y being a number and Z being numbers
Yes, I figured it was an ID or code of some sort, so that doesn't change my advice. NVARCHAR
, especially if using a non-binary, case-insensitive collation, is probably one of the worst choices of datatype for this value. This ID should be in a column named UserCode
in the User
table with a non-clustered index defined on it. This makes it an "alternate" key and a quick and easy lookup from the app layer, one time, to get the "internal" integer value for that row, the INT IDENTITY
column as the actual UserID
(is usually best to name ID columns as {table_name}ID for consistency / easier maintenance over time). The UserID
INT value is what goes into all related tables to be the FK. An INT
column will JOIN much faster than an NVARCHAR
. Even using a binary collation, this NVARCHAR
column, while being faster than its current implementation, will still be at least 32 bytes (based on the given example of XXX-Y-ZZZZZZZZZZ
) whereas the INT
will be just 4 bytes. And yes, those extra 28 bytes do make a difference, especially when you have 13 million rows. Remember, this isn't just disk space that these values take up, it is also memory since ALL data that is read for queries goes through the Buffer Pool (i.e. physical memory!).
In this scenario, however, we're not following the foreign keys anywhere, but directly querying on them. If they're indexed, should it matter?
Yes, it still does matter since you are essentially doing the same operation as a JOIN: you are taking each value in the main table and comparing it to the values in the table variable / TVP. This is still a non-binary, case-insensitive (I assume) comparison that is very slow compared to a binary comparison. Each letter needs to be evaluated against not just upper and lower case, but against all other Unicode Code Points that could equate to each letter (and there are more than you think that will match A - Z
!). The index will make it faster than not having an index, but nowhere near as fast as comparing one simple value that has no other representation.