Increase MS SQL transaction performance

Question

If you're missing information, I will attach them if requested.

Workspace

I have a database running on a MS SQL 2012 standard edition of this kind:

tables:
- users(id, softId (not unique), birthdate)
  - rows: 10.5 million
  - indexes: all three columns, birthdate(clustered)
- docs(docId, userId, creationDate, deleteDate, lastname, forename, classificationId)
  - rows: 23 million
  - indexes: lastname, forename, docId, creationDate, userID(clustered)
  - notice: in this specific case the names are related to the docs, not to the userId
- classifications(id, description)
  - rows: 200
- three tables "data"
  - rows: 10, 13 and 0.3 million
  - indexes: docIds
relations:
- users to docs: 1 to n
- classifications to docs: 1 to n
- docs to data-tables: 1 to n

To select the complete records I am actually on following statements:

Server-Execution-Time 16 seconds

SELECT * FROM (
    select * from docs 
    where userID in (
        select distinct userID from users where softId like '...'
    )
) as doc
LEFT JOIN users on users.userID = doc.userId
LEFT JOIN classifications on classifications.id = doc.classificationId
LEFT JOIN data1 on data1.docId = doc.docId
LEFT JOIN data2 on data2.docId = doc.docId
LEFT JOIN data3 on data3.docId = doc.docId;

Updated - now 15 seconds

SELECT
docID, calssificationId, classificationDescription,
userId, softId, forename, lastname, birthdate,
data1.id, data1.date, data2.id, data2.date, data3.id, data3.date,
FROM docs
JOIN users on users.userID = doc.userId AND softId like '...'
LEFT JOIN classifications on classifications.id = doc.classificationId
LEFT JOIN data1 on data1.docId = doc.docId
LEFT JOIN data2 on data2.docId = doc.docId
LEFT JOIN data3 on data3.docId = doc.docId;

execution plans

execution plan

Server-Execution-Time 17 seconds

DECLARE @userIDs table( id bigint );
DECLARE @docIDs table( id bigint );

insert into @userIDs select userID from users where softId like '...';
insert into @docIDs select docId from docs where userId in ( select id from @userIDs);
SELECT * FROM users where userID in ( select id from @userIDs);
SELECT * FROM docs where docID in (select id from @docIDs);
SELECT * FROM data1 where data1.docId in (select id from @docIDs);
SELECT * FROM data2 where data2.docId in (select id from @docIDs);
SELECT * FROM data3 where data3.docId in (select id from @docIDs);
GO

Updated - now 14 seconds

DECLARE @userIDs table( id bigint, softId varchar(12), birthdate varchar(8) );
DECLARE @docIDs table( id bigint, classification bigint, capture_date datetime, userId bigint, lastname varchar(50), forename varchar(50) );


INSERT INTO @userIDs select userID, softId, birthdate from users where softId like '...';
INSERT INTO @docIDs select docID, classification, capture_date, userID, lastname, forename from docs where userID in ( select id from @userIDs);

SELECT * FROM @userIDs;
SELECT * FROM @docIDs;

SELECT [only needed fields] FROM data1 where docID in (select id from @docIDs);
SELECT [only needed fields] FROM data2 where docID in (select id from @docIDs);
SELECT [only needed fields] FROM data3 where docID in (select id from @docIDs);

execution plans

General Updates @AntonínLejsek suggested to define the docId of documents as a clustered index and the pkId as non-clustered. This changed the execution-time as follow:

Join-Statement: -1 second
Multi-Select-Statement: -5 seconds

I checked the indexes again and changed the included columns, now they have the execution-time:

Join-Statement: 4 seconds
Multi-Select-Statement: 6 seconds

The "simple" question

Do somebody have suggestions to reduce the executiontime?

Dear, the first: Why use distinct in your subquery select distinct userID from users where softId like '...'. userID is not primary key for users table? — Joe Taras, Oct 06 '16 at 12:22
@JoeTaras : you are right, the distinct is unnecessary. I removed it but it has no impact. — coivip, Oct 06 '16 at 12:40
Is a `select *` appropriate here or do you only need to retrieve specific fields? Can you post the execution plan? — alroc, Oct 06 '16 at 12:59
@alroc I had reduced the number of fields by changing somthing else, but didnt changed the post, sorry - the executionplans are linked in now — coivip, Oct 06 '16 at 14:19

score 2 · Answer 1 · answered Oct 06 '16 at 12:23

2

I would phrase the logic as:

I would get rid of the first subquery and just do the necessary work on the users table:

SELECT *
FROM docs JOIN
     users
     ON users.userID = doc.userId AND softId LIKE '...'  LEFT JOIN
     . . .

The logic in the IN is unnecessary if you are doing a JOIN anyway.

Note: This might not help much, because your query appears to be returning lots of data, both in columns and rows.

answered Oct 06 '16 at 12:23

Gordon Linoff

1,242,037
58
646
786

I changed it as suggested and we have now 15 sec. execution time and will update the state at the main post - thank you. The query returns usually ~1000 rows. – coivip Oct 06 '16 at 12:52
Is `userID` the first key in the clustered index on `users`? If not, you will need a separate index on it. And, you should have indexes on the primary keys of all tables, including `classifications` and so on. You should also double check the indexes on the `data` tables, and time the query by building up the result set one `left join` at a time. 10+ seconds seems to long for the query. – Gordon Linoff Oct 06 '16 at 15:14
userId is non clustered - cause of my comprehension, clustered indexes are trees, so clustering on birthdates does more sence for me, or is this wrong? all tables has primary keys. – coivip Oct 06 '16 at 15:23
@coivip . . . You need for `users(userId)` to be the first key in *some* index, clustered or not. – Gordon Linoff Oct 06 '16 at 23:43
@GordonLinhoff it was the first on a non-clustered index – coivip Oct 07 '16 at 09:07

Antonín Lejsek · Answer 2 · 2016-10-06T16:14:44.160

1

I see two different databases in the plan, I would try to test it in one database first.

The database design is weird. You have clustered index on birthdate. As it is not unique, database has to make up another 4B number for making it unique. So You have 12B key in every nonclustered index, which is space and performance inefficient. You do not even have id included in the nonclustered index, so it has to be looked up, which is time wasting. In most cases You should cluster on primary key and that should be id.

--Deleted-- while softIds is almost unique, this paragraph became irelevant.

edited Oct 06 '16 at 16:14

answered Oct 06 '16 at 15:15

Antonín Lejsek

6,003
2
16
18

I know, the second database is a testing databse where I merged some tables to reduce the joins in the queries. I have 10120000 userIds and 10100000 softIds. I understood the clustered as a tree, so the birthdate made more sence for me as a clustered index and the userIds and softIds as nonclustered - or did I understood this in the wrong way? Usually I have to search like '123456%' - the max length is 12 – coivip Oct 06 '16 at 15:33
@coivip Thank You, moving softId to another table would not be good idea in this case then. For clustering index please have a look here: http://stackoverflow.com/questions/4332982/do-clustered-indexes-have-to-be-unique – Antonín Lejsek Oct 06 '16 at 16:16
I will have a look at the link later at home. I'm out of office now and have no chance to change that until tomorrow morning. When I did changes, I will publish them. – coivip Oct 06 '16 at 16:31
I switched the types of index between userId and and docId in dos. At the join version it has no impact on the execution-time, at the multi select statement is decreases 2 seconds of the execution-time – coivip Oct 07 '16 at 09:09

score 0 · Answer 3 · answered Oct 06 '16 at 15:01

0

Add clustered indexes on your table variables by defining primary keys.

DECLARE @userIDs table( id bigint primary key, softId varchar(12), birthdate varchar(8) );
DECLARE @docIDs table( id bigint primary key, classification bigint, capture_date datetime, userId bigint, lastname varchar(50), forename varchar(50) );

answered Oct 06 '16 at 15:01

Bruce Dunwiddie

2,888
1
15
20

same execution time – coivip Oct 06 '16 at 15:38

Increase MS SQL transaction performance

3 Answers3