6

So far, I am able to extract the list of database table using the below SQL query:

SELECT
    DISTINCT
    TABLE_SCHEMA,
    TABLE_NAME
FROM
    INFORMATION_SCHEMA.COLUMNS

In each of these table, the first column is named "Year". The values are from year "2011" to year "2017":

CREATE TABLE foo (
    [Year] int,
    AnotherColumn varchar(50),
    ...
)

CREATE TABLE bar (
    [Year] int,
    SomeOtherColumn guid,
    ...
)

CREATE TABLE ...

Now, I would need to count the number of rows of different years in each of the tables, and display the output in the format below:

| TABLE_SCHEMA | TABLE_NAME | 2011                | 2012                | ... | 2017                |
|:-------------|-----------:|:-------------------:|:-------------------:|:----|:-------------------:|
| SCHEMA       | foo        | no. of rows of 2011 | no. of rows of 2012 | ... | no. of rows of 2017 | 
| SCHEMA       | bar        | no. of rows of 2011 | no. of rows of 2012 | ... | no. of rows of 2017 | 
| SCHEMA       | ...        | no. of rows of 2011 | no. of rows of 2012 | ... | no. of rows of 2017 | 

Does anyone have any suggestions? Thanks a lot!

Chris Wong
  • 63
  • 3

1 Answers1

4

While every SQL implementation offers some form of value parameterization, no such facility exists to parameterize object identifiers (e.g. table names, column names, etc) - which means you must resort to Dynamic-SQL, which introduces its own risks (namely SQL injection).

With your specific problem, we can start by trying to solve it without Dynamic-SQL, by assuming a known and fixed set of tables to query, then we can convert it to Dynamic-SQL, hopefully in a safe manner:

SELECT
    'Table1' AS TableName
    [Year],
    COUNT(*) AS YearRowCount
FROM
    Table1
GROUP BY
    [Year]

UNION ALL

SELECT
    'Table2' AS TableName
    [Year],
    COUNT(*) AS YearRowCount
FROM
    Table2
GROUP BY
    [Year]

UNION ALL

...

Hopefully you're seeing a pattern here.

This query, so far, will give us results of this form:

TableName    Year    YearRowCount
'Table1'     2017            1234
'Table1'     2016            2345
'Table1'     2015            3456
'Table1'     2014            1234
'Table1'     2013            1234
'Table1'     2011            1234
'Table2'     2017            1234
'Table2'     2016            2345
'Table2'     2015            3456
'Table2'     2013            1234
'Table2'     2012            1234
'Table2'     2011            1234
...

We can then use PIVOT to transpose the rows into columns. PIVOT (and UNPIVOT) do require you to explicitly name each column to be transposed, unfortunately - but it would be nice if they had a PIVOT ALL feature or something).

SELECT
    tableName,
    YearRowCount,
    [2011], [2012], [2013], [2014], [2015], [2016], [2017]
FROM
(
    -- our UNION query goes here --
)
PIVOT
(
    SUM( YearRowCount )
    FOR [Year] IN ( 2011, 2012, 2013, 2014, 2015, 2016, 2017 )
)

So now we know the pattern of the inner query and the PIVOT statement to surround it, we can make it dynamic.

There are 3 approaches for generating Dynamic SQL on a "for each row..." basis. The first is to use a CURSOR, the second is to use some kind of T-SQL loop (WHILE, etc) - both of these approaches take the iterative approach - but there's a 3rd version which is more functional and syntactically simpler. I'll demonstrate this functional approach.

Also, we can avoid the uglier parts of manual string concatenation by using (abusing) the FORMATMESSAGE function which serves as a sprintf implementation. To use FORMATMESSAGE to format a string requires SQL Server 2016 or later (though Compatibility Level does not need to be 130 as far as I can tell). If you are running an earlier version you'll need to use CONCAT or 'foo' + @var + 'bar'-style concatenation.

I'm also using the COALESCE( [aggregate] + [separator], '' ) + [value] trick described in this answer: https://stackoverflow.com/a/194887/159145 - it's one method of concatenating (aggregating) row values, though it feels a bit ugly. Remember that SQL is primarily concerned with relational algebra of unordered sets of tuple data (i.e. tables), which does not normally cover view-level concerns like ordering or aggregating sorted data - which is what concatenation is.

DECLARE @unionTemplate varchar(1024) = '
SELECT
    ''%s.%s'' AS TableName
    [Year],
    COUNT(*) AS YearRowCount
FROM
    [%s].[%s]
GROUP BY
    [Year]
'

DECLARE @unionSeparator varchar(20) = '
UNION ALL
'

DECLARE @unionQuery varchar(max)

SELECT
    @unionQuery = COALESCE( @unionQuery + @unionSeparator, '' ) + FORMATMESSAGE( @unionTemplate, SCHEMA_NAME, TABLE_NAME, SCHEMA_NAME, TABLE_NAME )
FROM
    INFORMATION_SCHEMA.TABLES
ORDER BY
    SCHEMA_NAME,
    TABLE_NAME

Anyway, this query will generate the query stored in @unionQuery, so now we just need to compose it...

DECLARE @pivotQuery varchar(max) = '
SELECT
    tableName,
    YearRowCount,
    [2011], [2012], [2013], [2014], [2015], [2016], [2017]
FROM
(
    %s
)
PIVOT
(
    SUM( YearRowCount )
    FOR [Year] IN ( 2011, 2012, 2013, 2014, 2015, 2016, 2017 )
)'

SET @pivotQuery = FORMATMESSAGE( @pivotQuery, @unionQuery )

...and execute it (EXEC sp_executesql is preferred over the archaic EXEC()) - also note that EXEC() is not the same thing as EXEC!

EXEC sp_executesql @pivotQuery

ta-da!

Older SQL Server versions (2014, 2012, 2008 R2, 2008):

These are untested, but if you need to run on SQL Server versions older than 2016 (v13.0), try these alternatives to FORMATMESSAGE:

DECLARE @unionQuery nvarchar(max)

SELECT
    @unionQuery =
        COALESCE( @unionQuery + ' UNION ALL ', '' ) +
        CONCAT(
            'SELECT ''',
            SCHEMA_NAME, '.', TABLE_NAME, '[Year],
    COUNT(*) AS YearRowCount
FROM
    [', SCHEMA_NAME, '].[', TABLE_NAME, ']
GROUP BY
    [Year]
'
    )
FROM
    INFORMATION_SCHEMA.TABLES
ORDER BY
    SCHEMA_NAME,
    TABLE_NAME

As the @pivotQuery is only inserted once, it's okay to use REPLACE to insert the inner @unionQuery, but never do this when dealing with user-provided values because you open yourself up to SQL injection-like attacks:

SET @pivotQuery = REPLACE( @pivotQuery, '%s', @unionQuery )
Dai
  • 141,631
  • 28
  • 261
  • 374
  • thanks for illustrating. I try the upper suggestion. After all those DECLARE, it show error message when executing "SET _pivotQuery = FORMATMESSAGE( _pivotQuery, _unionQuery )". The error message is "String or binary data would be truncated. [SQL State=22001, DB Errorcode=8152]" – Chris Wong Sep 08 '17 at 06:56
  • @ChrisWong You might want to use the `REPLACE` approach (at the bottom of my answer) instead of the final `FORMATMESSAGE` - it's likely your query is longer than 2000 characters (which is `FORMATMESSAGE`'s internal limit). – Dai Sep 08 '17 at 06:59
  • @ChrisWong Also, are you executing it in SSMS? If so, you might want to use the Step-through Debugger to ensure nothing is being truncated unnecessarily. – Dai Sep 08 '17 at 07:00
  • I am using SQL Workbench/J. I am a end-user extracting the production data from database. – Chris Wong Sep 08 '17 at 07:04
  • @ChrisWong SQL Workbench/J does not seem to include a step-through debugger for Microsoft SQL Server. I recommend you download and install SSMS from Microsoft's website and use that instead. It's a freeware download: https://learn.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms – Dai Sep 08 '17 at 07:32
  • Following your idea at the top, I success by using your older SQL version and changing for several syntax. Many thanks again! – Chris Wong Sep 08 '17 at 10:29