-1

I have found methods to identify columns with duplicate values and I can use Select Distinct to find columns with only a single value, but I have a lot of tables with hundreds of columns and sometimes millions of rows. I need to find every column name where that column contains only 0, 0.00, Z, N, or Y. Essentially trying to create a list of useless columns.

How do I iterate select distinct (or something along those lines) across every column, one column at a time, and identify those instances in a table where the column values are the same in every row? Or alternatively, instances where the row count is less than 2.

This seems sort of close but specifying the value isn't going to help: SQL Server SELECT where any column contains 'x'

Group by and Having > 1 sort of works but I would have to run it for each column.

Select Distinct works but I have to run it one column at a time unless there is some better method I am missing.

I got a few of these methods to cooperate but only on numeric columns: Find columns that contain only zeros

This works to identify columns containing only NULL but I can't seem to get it do work for any other value:

DECLARE crs CURSOR LOCAL FAST_FORWARD FOR SELECT name FROM syscolumns WHERE id=OBJECT_ID('TABLENAME')
OPEN crs
DECLARE @name sysname
FETCH NEXT FROM crs INTO @name
WHILE @@FETCH_STATUS = 0
BEGIN
    EXEC('SELECT ''' + @name + ''' WHERE NOT EXISTS (SELECT * FROM TABLENAME WHERE ' + @name + ' IS NOT NULL)')
    FETCH NEXT FROM crs INTO @name
END
CLOSE crs
DEALLOCATE crs

I have started adapting a bit but am hitting errors about incorrect syntax near from:

DECLARE crs CURSOR LOCAL FAST_FORWARD FOR SELECT name FROM syscolumns WHERE id=OBJECT_ID('TABLENAME')
DECLARE @sql NVARCHAR(MAX)
OPEN crs
DECLARE @name sysname
FETCH NEXT FROM crs INTO @name
WHILE @@FETCH_STATUS = 0
BEGIN
Create Table #Temp (ID NVARCHAR, Info INT)
SET @sql = N'select distinct ''' + @name + ''' insert into #Temp (ID) from TABLENAME; SELECT COUNT(*) from #Temp Having COUNT(*)<2 insert into  #Temp (Info);' 
EXEC sp_executesql @sql 
DROP TABLE #Temp
FETCH NEXT FROM crs INTO @name
END
CLOSE crs
DEALLOCATE crs
  • The only way you could achieve this would be with dynamic SQL. This, however, seems like an odd requirement. Why do you have need of finding this out? What are you going to do with this information? This sounds like the beginnings, or end, or an [XY Problem](http://xyproblem.info). – Thom A Oct 29 '20 at 21:23
  • I need to reduce columns so we can better line this old database up with a new database. So all of these columns with just zero or 0.00 don't provide any information and make it a lot harder to sift through and line up old data with new. Easy enough on smaller tables with 10 or 20 columns but where there are 200 columns and only 20 of them have useful information it is too much to sort through. – David Overfelt Oct 29 '20 at 21:29

2 Answers2

0

Assuming this is a one-off, it's probably easier and quicker to construct an ad-hoc query like

SELECT
  SUM(CASE WHEN Col1 NOT IN ('0.00', 'Z', 'N', 'Y') THEN 1 ELSE 0 END) AS Col1,
  SUM(CASE WHEN Col2 NOT IN ('0.00', 'Z', 'N', 'Y') THEN 1 ELSE 0 END) AS Col2,
  SUM(CASE WHEN Col3 NOT IN ('0.00', 'Z', 'N', 'Y') THEN 1 ELSE 0 END) AS Col3
FROM
yourtable

Any column with a total >= 1 has useful data.

Notes

  • The above does an implicit group as it's not actually grouping on any fields.
  • The above assumes all columns are strings (e.g., varchar/nvarchar). If there is a mix, you'll need to tweak it by column
  • NULLs also don't count as valid values (e.g., don't add 1 to total)

To get your column list in SSMS, you can do this by

  • scripting the table (right click on table -> script table as -> create table), or
  • by just opening the table on the left, a new query window on the right, and drag the word 'columns' to the right hand side

Then just use your favorite text editor or similar to create your SQL using the template above - replacing Col1, Col2 etc with the actual column names. If you have a mix of string and numerical types, the first option can be used to group them together for constructing the 'IN' expressions.

seanb
  • 6,272
  • 2
  • 4
  • 22
  • Might be workable if I can generate the SUM statements but I think I may have a method for that. Need to do this over pat least five hundred columns, all mixed around as numeric, varchar, etc. – David Overfelt Oct 30 '20 at 00:51
  • One way to do so is to copy the column list into Excel, then in the next column use a formula like `="SUM(CASE WHEN [" & A1 & "] NOT IN ...` - then drag that formula to all rows. If you can get all the column names plus their types (into, say, column A), you can use text functions to make columns B and C into the column name then their general type (string, number, etc) into C, then the D column has the formula above to create the SUM(CASE()) expression based on an IF statement to first determine their type e.g., if numeric, check for = 0; if string, check for IN ('z', 'n', 'y') – seanb Oct 30 '20 at 01:50
0

Here's one way you could go about this. For each column of the specified table, it does a count of rows that match your criteria and, if the column's row count matches that of the table's, the column is added to the XML list for review.

NOTE:

Given your number of columns and possible row counts, this process could take a while.

DECLARE @Columns table ( column_name varchar(255), pk INT IDENTITY(1,1) );
DECLARE 
    @TableName varchar(50) = 'tblClients',
    @Criteria varchar(255) = 'IN (''0'', ''0.00'', ''Z'', ''N'', ''Y'')',
    @sql nvarchar(MAX),
    @rows int;

/* Get the current row count for @TableName */

SET @sql = 'SELECT @count = COUNT( DISTINCT [' + @col + '] ) FROM ' + @TableName + ';';
    EXEC sp_executesql @sql, N'@count int OUT', @count = @rows OUT;

/* Create an XML object to store fields that match criteria */
DECLARE @List xml = '<root><columns></columns></root>';

    -- Insert the table name and number of rows processed.
    SET @List.modify( 
      'insert
        <table>
          <name>{sql:variable("@TableName")}</name>
          <rows>{sql:variable("@rows")}</rows>
          <criteria>{sql:variable("@Criteria")}</criteria>
        </table>
      as first
      into (//root)[1]' 
    );

/* Collect a list of column names for the specified table */

INSERT INTO @Columns ( column_name )
SELECT [name] FROM sys.columns WHERE object_id = OBJECT_ID( @TableName ) ORDER BY column_id;

/* For-each column, check if it contains: [ 0 | 0.00 | Z | N | Y ] */

DECLARE @pk int = 1, @col varchar(255), @count int;
WHILE @pk <= ( SELECT MAX ( pk ) FROM @Columns ) 
BEGIN

    -- Current column.
    SELECT @col = column_name FROM @Columns WHERE pk = @pk;
    
    -- Get count of rows for this column with the specified values.
    SET @sql = 'SELECT @count = COUNT( DISTINCT [' + @col + '] ) FROM ' + @TableName + ' WITH (NOLOCK) WHERE CAST( [' + @col + '] AS varchar(MAX) ) ' + @Criteria + ';';
        EXEC sp_executesql @sql, N'@count int OUT', @count = @count OUT;

    -- Insert the column for review if its count matches the table row count.
    IF @count BETWEEN 1 AND 2
        SET @List.modify(
          'insert 
            <column>
              <name>{sql:variable("@col")}</name>
              <count>{sql:variable("@count")}</count>
            </column> 
          into (//root/columns)[1]'
        );

    -- Next column.
    SET @pk = @pk + 1;

END

/* SELECT @List results */
SELECT @List AS List;

In my case, the following XML is returned. These columns match your criteria of ones that should be reviewed.

<root>
  <table>
    <name>tblClients</name>
    <rows>143</rows>
    <criteria>IN ('0', '0.00', 'Z', 'N', 'Y')</criteria>
  </table>
  <columns>
    <column>
      <name>WebPortal</name>
      <count>143</count>
    </column>
    <column>
      <name>EnforceQTY</name>
      <count>143</count>
    </column>
    <column>
      <name>CheckForPrepaid</name>
      <count>143</count>
    </column>
    <column>
      <name>ReportNet</name>
      <count>143</count>
    </column>
    <column>
      <name>ActiveClient</name>
      <count>143</count>
    </column>
  </columns>
</root>
critical_error
  • 6,306
  • 3
  • 14
  • 16
  • I think this is getting closer. But instead of checking if the row count of the column = the row count of the table, I think I need to look for instances where the distinct row count is less than equal to 2. Fairly regularly every row will be populated with default values all the way down. Posted some code that I am working on in the original question. – David Overfelt Oct 30 '20 at 13:24
  • @DavidOverfelt Simple enough. I've updated my example to do just that. Check it out. – critical_error Oct 30 '20 at 16:20
  • Using `@count BETWEEN 1 AND 2` because zero is returned if no data matched the criteria on a given column. – critical_error Oct 30 '20 at 16:24