1

I'm using the SSMS tool Data Discovery and Classification.

The tool automatically search for columns name like %address%, %name%, %surname%, %e-mail%, %tax%, %zip%, etc... and nicely suggests you what it might be a sensible data.

The fact is that outside from Anglo-Saxon societies the column name is not in English but it can be in French, Spanish, Italian, etc..

So I found a query that could help me out list sensible data based on my language:

SELECT schema_name(tab.schema_id) AS schema_name
    ,tab.name AS table_name
    ,col.name AS column_name
    ,t.name AS data_type
    ,NULL as Data_Preview
FROM sys.tables AS tab
INNER JOIN sys.columns AS col ON tab.object_id = col.object_id
LEFT JOIN sys.types AS t ON col.user_type_id = t.user_type_id
ORDER BY schema_name
    ,table_name
    ,column_id;

Very nice.

But it would be better if I could add a last column called Content that PIVOT the content of each column and SELECT TOP 5 of each one and nicely print it in the last column.

Can you help me archive that?

That would be a DYO Data Discovery and Classification.

EDIT: I might have expressed myself badly.

I'm running mz query against AdventureWorks2019:

schema_name table_name column_name data_type Data_Preview
Person Address AddressID int NULL
Person Address AddressLine1 nvarchar NULL
Person Address AddressLine2 nvarchar NULL
Person Address City nvarchar NULL
Person Address StateProvinceID int NULL
Person Address PostalCode nvarchar NULL

I want to PIVOT the content of each column (let's say the TOP 5) tho the last column

schema_name table_name column_name data_type Data_Preview
Person Address AddressID int 1,2,3,4,5
Person Address AddressLine1 nvarchar 1970 Napa Ct.,9833 Mt. Dias Blv.,7484 Roundtree Drive,9539 Glenside Dr,1226 Shoe St.
Person Address AddressLine2 nvarchar NULL,NULL,NULL,NULL,NULL
Person Address City nvarchar Bothell,Bothell,Bothell,Bothell,Bothell
Person Address StateProvinceID int 79,79,79,79,79
Person Address PostalCode nvarchar 98011,98011,98011,98011,98011

Each data can be separated by a coma or whatever. The result is that I want to have a preview of the data that is actually in the column:

enter image description here

This guys apparently arrived to archive that:

Thank you

Francesco Mantovani
  • 10,216
  • 13
  • 73
  • 113
  • What would really help here is if you could provide an example table with a few rows of sample data and the desired output based on your sample data. – Sean Lange Nov 17 '21 at 16:47
  • Hi @SeanLange, I updated my question. Thank you – Francesco Mantovani Nov 18 '21 at 07:59
  • You are going to have to use dynamic sql for this. I didn't look closely at the solution posted by Nav but that seems likely to be what you need. – Sean Lange Nov 18 '21 at 14:02
  • FYI, the tool is looking for **sensitive** data, not sensible data. I doubt the tool as any idea if the data in your database is sensible. A tool that detects sensible data (or perhaps non-sensible data) would likely be checking your columns have the right data type, such as storing a phone number in a `varchar` or a date in an `int`; neither of which are sensible. – Thom A Nov 20 '21 at 13:15
  • Also, the bounty description is wrong, `STUFF` doesn't do anything other than replace one part of a string with another. The "magic" is done by `FOR XML PATH`; all `STUFF` does in such solutions is remove the first delimiter. – Thom A Nov 20 '21 at 13:55

1 Answers1

2

Updated to support 2016

enter image description here

DROP TABLE IF EXISTS #ColumnsToDisplay

SELECT    ROW_NUMBER () OVER (ORDER BY tab.name) AS Iteration,
          s.name AS SchemaName,
          tab.name AS table_name,
          col.column_id,
          col.name AS column_name,
          t.name AS data_type,
          col.max_length,
          col.precision AS PrecisionNumber,
          CAST(NULL AS VARCHAR(MAX)) AS DataSample
INTO      #ColumnsToDisplay
FROM      sys.tables AS tab
JOIN      sys.schemas AS s
    ON    s.schema_id = tab.schema_id
JOIN      sys.columns AS col
    ON    col.object_id = tab.object_id
LEFT JOIN sys.types AS t
    ON    col.user_type_id = t.user_type_id
     
DECLARE @Iterations       INT = 0,
        @CurrentIteration INT = 1;

SELECT @Iterations = MAX (Iteration)
FROM   #ColumnsToDisplay

WHILE @CurrentIteration <= @Iterations
BEGIN
    DECLARE @CurrentTableName  VARCHAR(100)   = '',
            @CurrentColumnName VARCHAR(100)   = '',
            @DynamicQuery      NVARCHAR(1000) = N''
    DECLARE @Sample VARCHAR(MAX)

    SET @CurrentTableName = '';
    SET @DynamicQuery = N'';
    SELECT @CurrentTableName = CONCAT (ttq.SchemaName, '.', ttq.table_name),
           @CurrentColumnName = ttq.column_name
    FROM   #ColumnsToDisplay AS ttq
    WHERE  ttq.Iteration = @CurrentIteration

    IF (@CurrentTableName = '')
    BEGIN
        SET @CurrentIteration += 1

        CONTINUE
    END

 -- SQL Server 2019
 -- SET @DynamicQuery = CONCAT (N'
 --       SELECT @Sample = STRING_AGG(t.ColumnData,'', '') 
 --       FROM (
 --               SELECT TOP 5  CAST(x.[', @CurrentColumnName, '] AS VARCHAR(MAX)) AS ColumnData 
 --               FROM ', @CurrentTableName, ' AS x 
 --               WHERE x.[', @CurrentColumnName, '] IS NOT NULL
 --       )t')

 -- SQL Server 2016 and lower where Stuff is supported   
    SET @DynamicQuery = CONCAT (N'
    SELECT @Sample =  STUFF((SELECT '', ''+ t.ColumnData  
    FROM (
            SELECT TOP 5 CAST(x.[', @CurrentColumnName, '] AS VARCHAR(MAX)) AS ColumnData 
            FROM ', @CurrentTableName, ' AS x 
            WHERE x.[', @CurrentColumnName, '] IS NOT NULL 
    ) AS t
       FOR XML PATH('''')),1,1,'''')')
        
    EXECUTE sys.sp_executesql @DynamicQuery,
                              N'@Sample VARCHAR(MAX) OUTPUT',
                              @Sample = @Sample OUTPUT

    UPDATE #ColumnsToDisplay
    SET    DataSample = @Sample
    WHERE  Iteration = @CurrentIteration

    SET @CurrentIteration += 1
END

SELECT ctd.Iteration,
       ctd.SchemaName,
       ctd.table_name,
       ctd.column_id,
       ctd.column_name,
       ctd.data_type,
       ctd.max_length,
       ctd.PrecisionNumber,
       ctd.DataSample
FROM   #ColumnsToDisplay AS ctd 
Nav
  • 121
  • 5
  • Thank you @Nav, but what I need is much easier. I have expressed myself badly. I have now updated my question. Thank you – Francesco Mantovani Nov 18 '21 at 08:00
  • your code completes but where I can SELECT the results? – Francesco Mantovani Nov 18 '21 at 14:30
  • Sorry i missed a select statement at the bottom SELECT * from #TablesToQuery – Nav Nov 18 '21 at 14:59
  • The table is still empty – Francesco Mantovani Nov 18 '21 at 15:08
  • I Saw the examples you shared. If you want all columns of a particular table with data aggregation we can tweak the query to do that. I'll try and update my answer – Nav Nov 18 '21 at 15:15
  • I used DisplayName as column name with the data i had. you might need to try a column name you have in your DB. I'll update my code based on your updates – Nav Nov 18 '21 at 15:19
  • From the screenshot it really looks is what I want but unfortunately it returns me an empty table. If I use AdventureWorks and I use `DECLARE @ColumnName VARCHAR(50) = 'Production.Product'` it still returns me an empty table. Can you try to run your code on AdventureWorks? – Francesco Mantovani Nov 18 '21 at 15:33
  • Unfortunately i do not have it in my local. I'll try to do it today. And I updated the above query to do a aggregate by column name per table name given. – Nav Nov 18 '21 at 15:53
  • @FrancescoMantovani I restored the AdventureWorks2019 and ran my query against it. Output is Shown in the image. Updated the Query for your use. It ran well. hope it works for you. – Nav Nov 18 '21 at 19:25
  • It works @Nav, but why using `@TableName`? I tried to remove that but is giving me troubles like `Incorrect syntax near the keyword 'Schema'.`. The idea is to replace the Data Discovery and Classification and run it in bulk for the whole database. I cannot keep changing the table name if I have 100 tables. Thank you – Francesco Mantovani Nov 19 '21 at 09:45
  • So strange, if I remove `DECLARE @TableName VARCHAR(100) = 'Product'` and `WHERE tab.name = @TableName` the query is completed with errors. but then if I execute `select * from #ColumnsToDisplay` the results that I wanted are there. Thank you so much for your help. I still don't understand how to make it run smoothly – Francesco Mantovani Nov 19 '21 at 10:35
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/239392/discussion-between-francesco-mantovani-and-nav). – Francesco Mantovani Nov 19 '21 at 10:43
  • 1
    I tried to modify your code to make it available for SQL Server 2016 following this example: https://learn.microsoft.com/en-us/answers/questions/285445/how-to-get-column-values-in-one-comma-separated-va.html . ` SET @DynamicQuery = CONCAT (N' SELECT @Sample = STUFF((SELECT t.ColumnData,'', '' FROM ( SELECT TOP 5 CAST(x.', @CurrentColumnName, ' AS VARCHAR(MAX)) AS ColumnData FROM ', @CurrentTableName, ' AS x WHERE x.', @CurrentColumnName, ' IS NOT NULL FOR XML PATH('') ), 1, 1, ''))t')` – Francesco Mantovani Nov 19 '21 at 11:03
  • The code is unfortunately not working :( – Francesco Mantovani Nov 19 '21 at 11:03
  • The errors were caused by column names like Schema_name, Primary, Database etc.. which are key words in SQL. Dynamic sql does not understand that so i made a simple tweak to wrap column names by [ set brackets] -> [Primary] which will be treated as a non key word. it works – Nav Nov 19 '21 at 22:10
  • @FrancescoMantovani Updated the Query to give all table output. And Will chat further. Just posted the comment above for clarity – Nav Nov 19 '21 at 22:12
  • 1
    Updated the query to support sql server 2016 – Nav Nov 19 '21 at 23:08
  • 1
    @Nav, use `QUOTENAME` rather than concatenating brackets to enclose columns. That will handle brackets in identifiers. – Dan Guzman Nov 20 '21 at 23:26
  • Good to know. The query works as is. But Will definitely try that in my works. Thanks – Nav Nov 21 '21 at 03:56