0

I am just wondering if anyone can see a better solution to this issue.

I previously had a flat (wide) table to work with, that contained multiple columns. This table has now been changed to a dynamic table containing just 2 columns (statistic_name and value).

I have amended my code to use sub queries to return the same results as before, however I am worried the performance is going to be terrible when using real live data. This is based on the exacution plan which shows a considerable difference between the 2 versions.

See below for a very simplified example of my issue -

CREATE TABLE dbo.TEST_FLAT
(
    ID INT,
    TEST1 INT,
    TEST2 INT,
    TEST3 INT,
    TEST4 INT,
    TEST5 INT,
    TEST6 INT,
    TEST7 INT,
    TEST8 INT,
    TEST9 INT,
    TEST10 INT,
    TEST11 INT,
    TEST12 INT
)

CREATE TABLE dbo.TEST_DYNAMIC
(
    ID INT,
    STAT VARCHAR(6),
    VALUE INT
)

CREATE TABLE dbo.TEST_URNS
(
    ID INT
)

-- OLD QUERY
SELECT D.[ID], D.TEST1, D.TEST2, D.TEST3, D.TEST4, D.TEST5, D.TEST6, D.TEST7, D.TEST8, D.TEST9, D.TEST10, D.TEST11, D.TEST12
FROM [dbo].[TEST_URNS] U
INNER JOIN [dbo].[TEST_FLAT] D
ON D.ID = U.ID

-- NEW QUERY
SELECT U.[ID], 
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST1') AS TEST1,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST2') AS TEST2,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST3') AS TEST3,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST4') AS TEST4,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST5') AS TEST5,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST6') AS TEST6,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST7') AS TEST7,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST8') AS TEST8,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST9') AS TEST9,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST10') AS TEST10,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST11') AS TEST11,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST12') AS TEST12
FROM [dbo].[TEST_URNS] U

Note this is in SQL2008 R2 and this will be part of a stored procedure, the flat version of the table contained hundreds of thousands of records (900k or so at last count).

Thanks in advance.

user1948635
  • 1,357
  • 4
  • 15
  • 22
  • Looks like **[EAV](http://stackoverflow.com/questions/1336449/eav-over-sql-server)** – Lukasz Szozda Nov 16 '15 at 16:57
  • It will also depend on how much data do you plan adding to it.If there is millions of new rows every month the EAV(2 cols) model has scaling issues – Mihai Nov 16 '15 at 16:58

3 Answers3

1

Create an index on the STAT column of TEST_DYNAMIC, for quick lookups.

But first consider redesigning TEST_DYNAMIC changing STAT varchar(6) to STAT_ID int (referencing a lookup table) Then on TEST_DYNAMIC, create an index on STAT_ID which will run quite a bit faster than an index on a text field.

1

Create your TEST_DYNAMIC and TEST_URNS tables like this:

CREATE TABLE [dbo].[TEST_DYNAMIC](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [STAT] [varchar](50) NOT NULL,
    [VALUE] [int] IDENTITY(1,1) NOT NULL,
 CONSTRAINT [PK_TEST_DYNAMIC] PRIMARY KEY CLUSTERED 
(
    [ID]
))

CREATE TABLE dbo.TEST_URNS
(
    ID [int] IDENTITY(1,1) NOT NULL
)
CONSTRAINT [PK_TEST_URNS] PRIMARY KEY CLUSTERED 
(
    [ID]
))

If you notice after a period of time that performance becomes poor, then you can check the index fragmentation:

SELECT a.index_id, name, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(dbo.TEST_DYNAMIC'),
 NULL, NULL, NULL) AS a
    JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;
GO

Then you can rebuild the index like so:

ALTER INDEX PK_PK_TEST_DYNAMIC ON dbo.TEST_DYNAMIC
REBUILD;
GO

For details please see https://msdn.microsoft.com/en-us/library/ms189858.aspx

Also, I like @Brett Lalonde's suggestion to change STAT to an int.

user8128167
  • 6,929
  • 6
  • 66
  • 79
  • Thanks for the response. I am not actually able to modify the table structures, but Ill request that indexes are created as above. – user1948635 Nov 17 '15 at 10:40
  • 1
    OK, great. Clustered indexes sort and store where the data rows in the table or view are physically stored on disk based on their key values, so you will likely have better performance using this type of index. See https://msdn.microsoft.com/en-CA/library/ms190457.aspx – user8128167 Nov 17 '15 at 16:26
1

The only way to really know is to try it out. In general, modern hardware should be able to support either query with little noticable impact in performance as long as you are indexing both tables correctly (you'll probably need an index on ID and STAT).

If you have 900K entities and 12 attributes, you have around 10 million rows; that should be fine on a decent serer. Eventually, you may run into performance problems if you add many records every month.

The bigger problem is that the example queries you paste are almost certainly not what you'll end up running in your real queries. If you have to filter and/or compare TEST5 with TEST6 on your derived table, you don't benefit from the additional indexing you could do if they were "real" columns.

You could then come full circle and implement your EAV table as an indexed view.

Community
  • 1
  • 1
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52