How to improve SQL Server stored procedure performance?

Question

I'm calling a stored procedure through the Enterprise Library DAL Application Block and pass a DataTable which is in turn 'received' as a custom Table datatype (@names as NamesTable) in my procedure. The procedure is very slow from the second call and I'm looking for a different way to implement it so performance is greatly increased.

The Names/HistoricalNames tables are huge (100 million records) and the data passed to these tables (through the dataset/table parameter) is around 4 million records).

Basically what it does (needs to do) is the following:

Import @names (which is the DataTable/Table parameter
- Check if either the Names or the HistoricalNames table contains any of the names contained in the new dataset/table parameter, if so skip the entire import and return 2
- Otherwise insert all records from @names in Names and return 1;

The tables look like this:

create table Names
(
    id int IDENTITY(1,1) NOT NULL,
    name nvarchar(20),
    otherId uniqueidentifier
)

create table HistoricalNames
(
    id int IDENTITY(1,1) NOT NULL,
    name nvarchar(20),
    otherId uniqueidentifier
)

The Table valued parameter (@names) looks like this:

create table NameTable
(
    name nvarchar(20)
    otherId uniqueidentifier
)

This is the procedure:

GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[sp_ImportNames]
    @names NameTable READONLY
AS
BEGIN       
    IF ((SELECT COUNT(cd.name) FROM Names as cd WHERE cd.name IN (SELECT c.name FROM @names as c)) > 0)
        BEGIN
            SELECT 2;
        END
    ELSE IF ((SELECT COUNT(cd.name) FROM HistoricalNames as cd WHERE cd.name IN (SELECT c.name FROM @names as c)) > 0)
        BEGIN
            SELECT 2;
        END
    ELSE
        BEGIN
            INSERT INTO Names (name, otherId) SELECT * FROM @names;
            SELECT 1;
        END
END


GO

Can this be easily tuned for performance? Any help would be greatly appreciated!

Spot on. We had indices on the `name` column however these weren't taken into the "create scripts" automatically... — ReFocus, Feb 09 '12 at 16:09
And you have clustered primary keys on the `id` columns, right? — marc_s, Feb 09 '12 at 16:14
Yes, however this is not the case in the table type parameter... — ReFocus, Feb 09 '12 at 16:19

score 3 · Accepted Answer · edited May 23 '17 at 11:56

3

The table valued parameter is almost certainly your problem.

Table Valued Parameter has slow performance because of table scan

It seems quite a lot for a basic ETL process to use a table parameter, but in any case, the table valued parameters aren't indexed.

So you're getting a 4m row table scan which is never something you want to see in a relational database.

You should get a massive boost by inserting it into a REAL table as a staging area with an index and then doing your operation on that table instead of the parameter. Also, make sure you have indexes on the other tables, too.

edited May 23 '17 at 11:56

Community

1
1

answered Feb 09 '12 at 16:09

Cade Roux

88,164
40
182
265

Would it help to select it into a VARIABLE (temp) table? Or should it be a REAL table? – ReFocus Feb 09 '12 at 16:11
@ReFocus http://stackoverflow.com/questions/886050/sql-server-creating-an-index-on-a-table-variable I'd just use a real table. I'd also consider maybe possibly reviewing your overall architecture that you are sending 4m rows into a proc versus a traditional ETL process. It's just a very uncommon scenario. – Cade Roux Feb 09 '12 at 16:20
We are extracting data from a CSV file and loading this into a DataTable which is sent to the proc. The database and application server are distributed and sending the CSV to the database server is not an option. Might there be another solution where the entire CSV file can be treated as a "batch"? – ReFocus Feb 09 '12 at 16:40
1

@ReFocus I don't know your architecture or motivations. If that's not possible, that's fine. What we typically did in data warehousing was get compressed files to the SSIS server, then load them into staging tables in the database. Whether that process used inline lookups or a batch SQL operation later would depend on the design. It's just highly unusual to send 4m rows through a proc table parameter interface - and as you see it isn't indexed. If you have requirements for retrying batches, you now have to send 4m rows again across the network. – Cade Roux Feb 09 '12 at 16:53

score 2 · Answer 2 · answered Feb 09 '12 at 15:52

Maybe something like this:

GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[sp_ImportNames]
    @names NameTable READONLY
AS
BEGIN       
    IF EXISTS(SELECT NULL FROM Names as cd WHERE EXISTS(SELECT NULL FROM @names as c WHERE c.name=cd.name))
        BEGIN
            SELECT 2;
        END
    ELSE IF EXISTS(SELECT NULL FROM HistoricalNames as cd WHERE EXISTS(SELECT NULL FROM @names as c WHERE c.name=cd.name))
        BEGIN
            SELECT 2;
        END
    ELSE
        BEGIN
            INSERT INTO Names (name, otherId) SELECT * FROM @names;
            SELECT 1;
        END
END

This query works great if duplicates are found. However when no duplicates are found it still hangs for over 120 seconds when trying to insert 50.000 records where the Names table contains 100.000 records... This cannot be the solution right? — ReFocus, Feb 09 '12 at 16:00

score 1 · Answer 3 · answered Feb 09 '12 at 15:49

1

Turn on the actual execution plan display - this will show you where performance is worse.

answered Feb 09 '12 at 15:49

Jimbo

2,529
19
22

How can I display the execution plan for a 'live' stored procedure? It is called from .NET where the datatable is passed from... – ReFocus Feb 09 '12 at 16:05
run the stored procedure from SSMS or get a dba to do it if you don't have access - the plan is visible there. – Jimbo Feb 09 '12 at 16:21

score 1 · Answer 4 · answered Feb 09 '12 at 15:53

1

Hm.

One statement "IF NOT EXISTS" with the 2 cehcks. You calculate the complete count every time, but are only interested in an indication whether one item exists, which can be done faster (abandon query once one row is found). The EXISTS clause exists for this reason.

answered Feb 09 '12 at 15:53

TomTom

61,059
10
88
148

How would I combine these efficiently? Just by using a simple JOIN without parameters? – ReFocus Feb 09 '12 at 15:58

score 1 · Answer 5 · edited Feb 09 '12 at 18:51

Putting aside the issue that passing that amount of data around sounds like a bad idea, the approach suggested by Arion is what I would suggest. You don't need any details about which name(s) matched or where they did so assuming you have indexes on the name columns you just want to find the first match and return that you were successful.

I'd also check the performance of exists using a join:

if exists(select 1
from Names exist
inner join @names newNames on newNames.name = exist.name)
begin
  select 2;
end

Also note the explicit use of column names for the insert for the "no match" case would usually be advised:

insert into Names (name, otherId)
select name, otherId
from @names

How to improve SQL Server stored procedure performance?

5 Answers5