2

I'm calling a stored procedure through the Enterprise Library DAL Application Block and pass a DataTable which is in turn 'received' as a custom Table datatype (@names as NamesTable) in my procedure. The procedure is very slow from the second call and I'm looking for a different way to implement it so performance is greatly increased.

The Names/HistoricalNames tables are huge (100 million records) and the data passed to these tables (through the dataset/table parameter) is around 4 million records).

Basically what it does (needs to do) is the following:

  1. Import @names (which is the DataTable/Table parameter
    • Check if either the Names or the HistoricalNames table contains any of the names contained in the new dataset/table parameter, if so skip the entire import and return 2
    • Otherwise insert all records from @names in Names and return 1;

The tables look like this:

create table Names
(
    id int IDENTITY(1,1) NOT NULL,
    name nvarchar(20),
    otherId uniqueidentifier
)

create table HistoricalNames
(
    id int IDENTITY(1,1) NOT NULL,
    name nvarchar(20),
    otherId uniqueidentifier
)

The Table valued parameter (@names) looks like this:

create table NameTable
(
    name nvarchar(20)
    otherId uniqueidentifier
)

This is the procedure:

GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[sp_ImportNames]
    @names NameTable READONLY
AS
BEGIN       
    IF ((SELECT COUNT(cd.name) FROM Names as cd WHERE cd.name IN (SELECT c.name FROM @names as c)) > 0)
        BEGIN
            SELECT 2;
        END
    ELSE IF ((SELECT COUNT(cd.name) FROM HistoricalNames as cd WHERE cd.name IN (SELECT c.name FROM @names as c)) > 0)
        BEGIN
            SELECT 2;
        END
    ELSE
        BEGIN
            INSERT INTO Names (name, otherId) SELECT * FROM @names;
            SELECT 1;
        END
END


GO

Can this be easily tuned for performance? Any help would be greatly appreciated!

ReFocus
  • 1,511
  • 1
  • 18
  • 24

5 Answers5

3

The table valued parameter is almost certainly your problem.

Table Valued Parameter has slow performance because of table scan

It seems quite a lot for a basic ETL process to use a table parameter, but in any case, the table valued parameters aren't indexed.

So you're getting a 4m row table scan which is never something you want to see in a relational database.

You should get a massive boost by inserting it into a REAL table as a staging area with an index and then doing your operation on that table instead of the parameter. Also, make sure you have indexes on the other tables, too.

Community
  • 1
  • 1
Cade Roux
  • 88,164
  • 40
  • 182
  • 265
  • Would it help to select it into a VARIABLE (temp) table? Or should it be a REAL table? – ReFocus Feb 09 '12 at 16:11
  • @ReFocus http://stackoverflow.com/questions/886050/sql-server-creating-an-index-on-a-table-variable I'd just use a real table. I'd also consider maybe possibly reviewing your overall architecture that you are sending 4m rows into a proc versus a traditional ETL process. It's just a very uncommon scenario. – Cade Roux Feb 09 '12 at 16:20
  • We are extracting data from a CSV file and loading this into a DataTable which is sent to the proc. The database and application server are distributed and sending the CSV to the database server is not an option. Might there be another solution where the entire CSV file can be treated as a "batch"? – ReFocus Feb 09 '12 at 16:40
  • 1
    @ReFocus I don't know your architecture or motivations. If that's not possible, that's fine. What we typically did in data warehousing was get compressed files to the SSIS server, then load them into staging tables in the database. Whether that process used inline lookups or a batch SQL operation later would depend on the design. It's just highly unusual to send 4m rows through a proc table parameter interface - and as you see it isn't indexed. If you have requirements for retrying batches, you now have to send 4m rows again across the network. – Cade Roux Feb 09 '12 at 16:53
2

Maybe something like this:

GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[sp_ImportNames]
    @names NameTable READONLY
AS
BEGIN       
    IF EXISTS(SELECT NULL FROM Names as cd WHERE EXISTS(SELECT NULL FROM @names as c WHERE c.name=cd.name))
        BEGIN
            SELECT 2;
        END
    ELSE IF EXISTS(SELECT NULL FROM HistoricalNames as cd WHERE EXISTS(SELECT NULL FROM @names as c WHERE c.name=cd.name))
        BEGIN
            SELECT 2;
        END
    ELSE
        BEGIN
            INSERT INTO Names (name, otherId) SELECT * FROM @names;
            SELECT 1;
        END
END
Arion
  • 31,011
  • 10
  • 70
  • 88
  • This query works great if duplicates are found. However when no duplicates are found it still hangs for over 120 seconds when trying to insert 50.000 records where the Names table contains 100.000 records... This cannot be the solution right? – ReFocus Feb 09 '12 at 16:00
1

Turn on the actual execution plan display - this will show you where performance is worse.

Jimbo
  • 2,529
  • 19
  • 22
  • How can I display the execution plan for a 'live' stored procedure? It is called from .NET where the datatable is passed from... – ReFocus Feb 09 '12 at 16:05
  • run the stored procedure from SSMS or get a dba to do it if you don't have access - the plan is visible there. – Jimbo Feb 09 '12 at 16:21
1

Hm.

  • One statement "IF NOT EXISTS" with the 2 cehcks. You calculate the complete count every time, but are only interested in an indication whether one item exists, which can be done faster (abandon query once one row is found). The EXISTS clause exists for this reason.
TomTom
  • 61,059
  • 10
  • 88
  • 148
1

Putting aside the issue that passing that amount of data around sounds like a bad idea, the approach suggested by Arion is what I would suggest. You don't need any details about which name(s) matched or where they did so assuming you have indexes on the name columns you just want to find the first match and return that you were successful.

I'd also check the performance of exists using a join:

if exists(select 1
from Names exist
inner join @names newNames on newNames.name = exist.name)
begin
  select 2;
end

Also note the explicit use of column names for the insert for the "no match" case would usually be advised:

insert into Names (name, otherId)
select name, otherId
from @names
animuson
  • 53,861
  • 28
  • 137
  • 147
kaj
  • 5,133
  • 2
  • 21
  • 18