How can I convert rows to columns in SQL

Question

I've been referencing this question a lot, but my case is a little different so I haven't quite figured it out.

I have a set of data that looks something like this:

    --------------------------------------
    | Id | Answer|  Question   | EntryBy
    --------------------------------------
    | 1  |John   |Name?        | User1  |
    | 2  |2.4    |Raiting?     | User1  |
    | 3  |ZH1E4A |UserId?      | User1  |
    | 4  |Paul   |Name?        | User1  |
    | 5  |2.3    |Raiting?     | User1  |
    | 6  |Ron    |Name?        | User2  |
    | 7  |857685 |UserId?      | User2  |
    ----------------------------

I need to pivot the data so that it's structured like so:

    ----------------------------------------------------------
    | Category | Name? | Raiting? | UserId? | EntryBy |
    ----------------------------------------------------------
    | Category1| John  |   2.4    | ZH1E4A  | User1   |
    | Category1| Paul  |   2.3    |  NULL   | User1   |
    | Category1| Ron   |   NULL   |  857685 | User2   |

As you can see, there are multiple "Questions" but they don't always have an answer/value. I know the exact number of questions that may be asked/answered so I'm assuming that may help if I used a CASE expression?

Note: The 'Category' column in the last table is just another value similar to 'EntryBy' in the first. I've attempted the pivot approach in the cited question, but the results I get are not correct. I also tried the CASE statement but it resulted in an error since the Questions are titled the same.

How you tell what data relates to what? For example, what says that `John` and `2.4` go together? Why not `John` and `2.3`? I can't see any kid of group identifier. — Thom A, Jan 08 '18 at 16:14
"not correct" is extremely unhelpful. Please show the code you tried, the results you got, the results you expect, etc. — MatBailie, Jan 08 '18 at 16:14
Is it a fixed list of values that need to be pivoted in to columns? *(Always `'Name?'`, `'Raiting?'`, `'UserId?'`, `'EntryBy'`?)* — MatBailie, Jan 08 '18 at 16:15
Are there any other fields that can be used to determine that rows 1-3 aggregate to one output row, then rows 4-5 for the next output row, etc? Or are you inferring this Solely based on the fact that the id's must be sequential and `'Name?'` always comes first? *(Which is a very weak / dangerous assumption as IDENTITY columns, for example, are not **guaranteed** to be sequential and in a set order.)* — MatBailie, Jan 08 '18 at 16:18
@Larnu - Good question. The only thing that somewhat ties the items together is the ID sequence. To truly know which data should be grouped... the only way I've been able to think of is a combination of the ID sequence, Category, and EntryBy — Johnny Gamez, Jan 08 '18 at 16:28
@MatBailie - I will post code shortly. And yes, I'm pretty much relying on the ID right now to determine what should return as a row. The list of questions is always in the same order. — Johnny Gamez, Jan 08 '18 at 16:29
The problem, therefore, is what @MatBailie raised. Sequence isn't always reliable. How do we know when the next sequence begins? Does `Question` having the value `Name?` denote it? We could, likely, do something with that, but it's may be a little hit and miss (just a warning). — Thom A, Jan 08 '18 at 16:31
Yes, we could strongly assume that `Name?` should start a new row. I realize that this data isn't structured as it should be, but it's what I was given to work with :/ I'll take hit & miss for now. — Johnny Gamez, Jan 08 '18 at 16:33
There is also a timestamp column that may be used to identify the sequence but even that seems weak to me. Each row in the first table has a timestamp that is only a few seconds apart for each sequence that should be grouped. — Johnny Gamez, Jan 08 '18 at 16:36
Is this a description of the real world case? User1 and User2 are both entering data in to some system, and they're doing so at the same time. They each enter all the available data for one "application user", then move on to another "application user"? This means that User2 may enter a Name for an "Application user" between User1 entering a Name and a Raiting? But, when looking at just User1 or just User2, they always finish one "Application User" before starting the next? *(In which case the Timestamp is actually pertinent here.)* — MatBailie, Jan 08 '18 at 17:03
Note : My previous comment highlights why it is important to include *(or find out)* as much information about the behaviour of the data as is possible. — MatBailie, Jan 08 '18 at 17:06

score 2 · Accepted Answer · answered Jan 08 '18 at 16:39

Being 2008, we lose the sum() over function, but we can simulate it via a cross apply to create a Grp indicator.

This also assumes the ID is sequential (risky) and Name? is the Group Key.

Also, check the spelling of RAITING

Also, I have no idea where Category is coming from

Example

Select [Name?]    = max(case when Question = 'Name?'   then Answer end)
      ,[Raiting?] = max(case when Question = 'Raiting?' then Answer end)
      ,[UserId?]  = max(case when Question = 'UserId?' then Answer end)
      ,[EntryBy?] = max([EntryBy])
 From (
        Select A.*
              ,B.Grp
         From YourTable A
         Cross Apply (Select Grp=count(*) from YourTable where Question='Name?' and ID<=A.ID) B
      ) A
 Group By Grp

Returns

Name?       Raiting?    UserId?  EntryBy?
John        2.4         ZH1E4A   User1
Paul        2.3         NULL     User1
Ron         NULL        857685   User2

I believe that it works, but on large data-sets that triangle-join in the `CROSS APPLY` is going to be expensive :( — MatBailie, Jan 08 '18 at 16:51
I was able to implement this the first try! I wasn't too concerned about efficiency since the data-set isn't very large. — Johnny Gamez, Jan 09 '18 at 16:00

Thom A · Answer 2 · 2018-01-08T16:53:48.440

This only does a single parse of the table (or "Values Table Expression") for this one, compared to John's, which does 2:

WITH VTE AS (
    SELECT *
    FROM (VALUES
                (1,'John  ','Name?   ','User1'),
                (2,'2.4   ','Raiting?','User1'),
                (3,'ZH1E4A','UserId? ','User1'),
                (4,'Paul  ','Name?   ','User1'),
                (5,'2.3   ','Raiting?','User1'),
                (6,'Ron   ','Name?   ','User2'),
                (7,'857685','UserId? ','User2'),
                (8,'Steve  ','Name?   ','User3'),
                (9,'2.5   ','Raiting?','User3'),
                (10,'Jane  ','Name?   ','User3'),
                (11,'GA18S1','UserId? ','User3'),
                (12,'2.3   ','Raiting?','User3'),
                (13,'ABH12D','UserId? ','User3')) V(ID, Answer, Question, EntryBy)),
Groups AS(
    SELECT *,
           ROW_NUMBER() OVER (ORDER BY ID ASC) -
           ROW_NUMBER() OVER (PARTITION BY CASE WHEN Question = 'Name?' THEN 0 ELSE 1 END ORDER BY ID ASC) AS Grp
    FROM VTE)
SELECT 'Category1' AS Category,
       MAX(CASE Question WHEN 'Name?' THEN Answer ELSE NULL END) AS [Name?],
       MAX(CASE Question WHEN 'Raiting?' THEN Answer ELSE NULL END) AS [Raiting?],
       MAX(CASE Question WHEN 'UserID?' THEN Answer ELSE NULL END) AS [UserID?],
       EntryBy
FROM Groups
GROUP BY CASE Grp WHEN 0 THEN Grp + 1 ELSE Grp END,
         EntryBy
ORDER BY CASE Grp WHEN 0 THEN Grp + 1 ELSE Grp END;

I also added a few extra values to display what happens if the sequencing goes wrong.

Result set:

Category  Name?   Raiting? UserID? EntryBy
--------- ------- -------- ------- -------
Category1 John    2.4      ZH1E4A  User1
Category1 Paul    2.3      NULL    User1
Category1 Ron     NULL     857685  User2
Category1 Steve   2.5      NULL    User3
Category1 Jane    2.3      GA18S1  User3

Also (to the OP) note that Steve has a value of `NULL` for `[UserID?]`. This is the example where the sequencing was wrong, and Jane appears to have answered the question `UserID?` twice. In this case, only the `MAX` value is returned (which may well be the incorrect value (Jane should have a `[UserID?]` value of `ABH12D`)). If this sequencing cannot be trusted, not can the solution (but I don't have an alternate solution, if we have no reliable sequence/group definition). — Thom A, Jan 08 '18 at 16:57
Adding `EntryBy` to the partitioning may make the assumptions safer. *(seems like multiple data-entry-users concurrently entering data sequentially for different "entities". Entities from different EntryBy users may interleave? But not from the same EntryBy user? If so, ordering by the timestamp the OP commented on -could- be better than using the id?)* — MatBailie, Jan 08 '18 at 16:59
Might do, @MatBailie. The OP, has, however, omitted that from their sample data at the moment, so I'd rather not make assumptions on what that data might look like. My concern, is that the `EntryBy` column could suffer from the same problem anyway (especially if it's using a Default value of `GETDATE()` or similar). Would be worth while for the OP to test and then, if things don't look right, to provide further samples and include the data that doesn't return the expected result set. — Thom A, Jan 08 '18 at 17:03

How can I convert rows to columns in SQL

2 Answers2