3

In the table below, I have a variable number of columns, and that number is in the 1000s. I need to sum all the values of each of the 1000 columns grouped by the person's name. So, smith's total test_score_1, total test_score_2,...total test_score_1000. And then Jackson's total test_score_1, total test_score_2,...total test_score_1000.

I don't know the number of 'test_score_n' columns beforehand and they are always changing.

So given this table:

name      test_score_1 test_score_2 ...  test_score_1000
  smith        2              1                 0
  jackson      0              3                 1
  jackson      1              1                 2
  jackson      3              0                 3
  smith        4              5                 1

How can I produce the table below?

name      test_score_1 test_score_2 ...  test_score_1000
  smith        6              6                1
  jackson      4              4                6
Steve Chambers
  • 37,270
  • 24
  • 156
  • 208
user798719
  • 9,619
  • 25
  • 84
  • 123
  • 18
    You have 1000 columns?! First task: fix that madness immediately. – DavidG Jul 28 '17 at 12:26
  • With biological data it is easy to have more than 1024 columns. One could argue that turning data wide should not be done in database management system but working with larger than ram data in R is painful mainly because the algorithms I need do not learn partially in chunks. I prefer to generate the dataset in sqlserver and use R or python for the data science part. – user798719 Aug 02 '17 at 01:28
  • I'm still not convinced - I would still normalise the data into multiple table. Also, I would argue that if you still absolutely need 1000 columns, that a SQL database is probably not a good place to store the data. – DavidG Aug 02 '17 at 08:17

6 Answers6

7

SQL to generate the SQL

DECLARE @generatedSQL nvarchar(max);

SET @generatedSQL = (

SELECT
    'SELECT ' + 
    SUBSTRING(X.foo, 2, 2000) + 
    'FROM ' + 
    QUOTENAME(SCHEMA_NAME(t.schema_id)) + '.' + QUOTENAME(t.name) +
    ' GROUP BY name' --fix this line , edited
FROM
    sys.tables t
    CROSS APPLY 
    (
    SELECT
        ', SUM(' + QUOTENAME(c.name) + ')'
    FROM 
        sys.columns c 
    WHERE 
        c.object_id = t.object_id
        AND
        c.name <> 'Name'
    FOR XML PATH('')
    ) X (foo)
WHERE
    t.name = 'MyTable'
 );

EXEC (@generatedSQL);
gbn
  • 422,506
  • 82
  • 585
  • 676
  • Hi what does select 'select' mean? Could I add a group by clause to this code? I want to group by name (please see amendmended question) and then sum up the 1000 columns for each name. Thanks! – user798719 Jul 29 '17 at 04:41
  • 1
    @gbn - I think you need to exclude `name` column from your `CROSS APPLY` – Alex Jul 31 '17 at 00:47
  • @user798719 - I suggest you check Google for "what is dynamic SQL" – Alex Jul 31 '17 at 00:48
  • @Alex why do you say that? I need it to generate the CSV – gbn Jul 31 '17 at 06:38
  • @user798719 The SQL needs generated because you decided to have 1000 columns. So I generate it. – gbn Jul 31 '17 at 06:39
  • @gbn I think what Alex means is `AND c.name <> 'name'` in the `WHERE` clause so the `name` column isn't included in the sums. Very minor further update - brackets are needed around the identifier in the last line, i.e. `EXEC (@generatedSQL)`. – Steve Chambers Jul 31 '17 at 12:44
  • 1
    @SteveChambers thanks, got it., The group by was added afterwards – gbn Jul 31 '17 at 12:50
4

Demo: http://rextester.com/MAFCP19297

SQL

DECLARE @cols varchar(max), @sql varchar(max);

SELECT @cols = 
     COALESCE(@cols + ', ', '') + 'SUM(' + COLUMN_NAME + ') AS ' + COLUMN_NAME
     FROM INFORMATION_SCHEMA.COLUMNS
     WHERE table_name = '<tbl name>'
       AND COLUMN_NAME <> 'name'
       -- The AND below may be optional - see "Additional Notes #1"
       AND TABLE_CATALOG = '<database schema name>';

SET @sql = 'SELECT name, ' + @cols + ' FROM tbl GROUP BY name;';

EXEC (@sql); 

Explanation

  1. The DECLARE creates two variables - one for storing the column summing part of the SQL and the other for storing the whole dynamically created SQL statement to run.
  2. The SELECT queries the INFORMATION_SCHEMA.COLUMNS system table to get the names of all the columns in tbl apart from the name column. (Alternatively the sys tables could be used - answers to this question discuss the relative merits of each). These row values are then converted into a single comma separated value using this method (which is arguably a little simpler than the alternative FOR XML PATH ('') method). The comma-separated values are a bit more than just the column names - they SUM over each column name and then assign the result with an alias of the same name.
  3. The SET then builds a simple SQL statement that selects the name and all the summed values - e.g: SELECT name, SUM(test_score_1) AS test_score_1, SUM(test_score_2) AS test_score_2, SUM(test_score_1000) AS test_score_1000 FROM tbl GROUP BY name;.
  4. The EXEC then runs the above query.

Additional Notes

  1. If there is a possibility that the table name may not be unique across all databases then the following clause is needed in the select: AND TABLE_CATALOG = '<database schema name>'
  2. My initial answer to this question was mistakenly using MySQL rather than SQL Server - this has now been corrected but the previous version is still in the edit history and might be helpful to someone...
Steve Chambers
  • 37,270
  • 24
  • 156
  • 208
1

Change tablename with your tablename.

     Declare @query as nvarchar(MAX) = (SELECT
    'SELECT name,' + SUBSTRING(tbl.col, 2, 2000) + ' FROM ' + QUOTENAME(SCHEMA_NAME(t.schema_id)) + '.' + QUOTENAME(t.name) + 'Group By name'
FROM
    sys.tables t
    CROSS APPLY 
    (
    SELECT
        ', SUM(' + QUOTENAME(columns.name) + ') as ' + columns.name
    FROM 
        sys.columns columns 
    WHERE 
        columns.object_id = t.object_id and columns.name != 'name'
    FOR XML PATH('')
    ) tbl (col)
WHERE
    t.name = 'tablename')
select @query EXECUTE(@query)
jignesh patel
  • 964
  • 7
  • 13
  • 1
    this answer seems very inspired by the answer from @gbn – t-clausen.dk Jul 28 '17 at 12:49
  • SELECT ', SUM(' + QUOTENAME(columns.name) + ') as ' + columns.name I am having trouble understanding the code. How do you insert a "sum" to each column as opposed to summing all the columns once? Where would I add a group by clause? I want to group by "name" before I sum each of the 10000 dynamic columns. – user798719 Jul 29 '17 at 08:02
  • I have an error using your code: "SELECT id, SUM([003]) as 003, SUM([008]) as 008, SUM([009]) as 009, SUM([010]) as 010, SUM([011]) ... SUM( FROM [dbo].[t1]Group By name" – user798719 Jul 31 '17 at 10:07
  • I used a small sample that has 362 columns (003, 008, 009, etc are the column names). the generated select statement doesn't show all columns, cutting off many. – user798719 Jul 31 '17 at 10:08
  • @user798719 Please send full query and error detail. Same working in my database. – jignesh patel Jul 31 '17 at 10:13
1

Try this dynamic column generation Sql script

DECLARE @Sql nvarchar(max)

SET @Sql=( SELECT DISTINCT 'SELECT'+ 
                    STUFF((SELECT ', '+  ' SUM( '+ COLUMN_NAME +' ) AS '+ QUOTENAME( COLUMN_NAME )
                    FROM INFORMATION_SCHEMA.COLUMNS Where TABLE_NAME ='Tab1000' 
                    FOR XML PATH (''),type).value('.','varchar(max)'),1,2,'')
                    +' From Tab1000'From INFORMATION_SCHEMA.COLUMNS Where TABLE_NAME ='Tab1000')

EXEC (@sql)
  • I get this error: "Operand data type varchar is invalid for sum operator." Any ideas? – user798719 Jul 28 '17 at 13:28
  • what is the data type of columns in your table –  Jul 28 '17 at 13:30
  • The updated question has a more concrete example. The "test_score" column contents are all INT types. I would like to group by name and then SUM each of the 1000 "test_score" columns by the person' last name. I can do this if I hard code 1000 columns. I just don't know how to do the dynamic sql equivalent. – user798719 Jul 29 '17 at 08:07
  • @Srini131 - you need to exclude `name` column e.g. `Where TABLE_NAME ='Tab1000' AND COLUMN_NAME <> 'Name'` – Alex Jul 31 '17 at 00:50
1

Try the below script

(set the @tableName= [yourTablename] and @nameColumn to the name of the field you want to group by)

    Declare @tableName varchar(50)='totalscores'
    Declare @nameColumn nvarchar(50)='name'

    Declare @query as nvarchar(MAX) ;

    select @query = 'select ' + nameColumn  + cast(sumColumns as nvarchar(max))  + 'from ' + @tableName +' group by ' + nameColumn     from (
    select @nameColumn nameColumn, (SELECT 
            ', SUM(' + QUOTENAME(c.name) + ') ' + QUOTENAME(c.name)
        FROM
            sys.columns c
        WHERE
            c.object_id=t.object_id and c.name != @nameColumn
        order by c.name
        FOR 
            XML path(''), type
     ) sumColumns
     from sys.tables t where  t.name= @tableName
    )t

    EXECUTE(@query)
Eid Morsy
  • 966
  • 6
  • 8
0

GBN's dynamic SQL would be my first choice (+1), and would be more performant. However, if you are interested in breaking this horrible cycle of a 1,000+ columns, consider the following:

Example

Declare @YourTable Table ([col 1] int,[col 2] int,[col 1000] varchar(50))
Insert Into @YourTable Values 
 (2,1,0)
,(4,5,1)

Select Item  = replace(C.Item,'_x0020_', ' ')
      ,Value = sum(C.Value)
 From @YourTable A
 Cross Apply (Select XMLData= cast((Select A.* for XML RAW) as xml)) B
 Cross Apply (
                Select Item   = a.value('local-name(.)','varchar(100)')
                      ,Value  = a.value('.','int') 
                 From  B.XMLData.nodes('/row')  as C1(n)
                 Cross Apply C1.n.nodes('./@*') as C2(a)
                 Where a.value('local-name(.)','varchar(100)') not in ('Fields','ToExclude')
             ) C
 Group By C.Item

Returns

Item        Value
col 1       6
col 2       6
col 1000    1
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66