How to move database table rows into columns based on a common entry

Question

I have the following database:

This is just a subset of a larger data set that would have been too big to type.

ID	CODE	IDENTIFIERS	COLORS	SHAPES	NUMB
101	24	N	RED	CIRCLE	174
102	32	N	BLUE	SQUARE	155
102	32	N	PURPLE	STAR	223
103	13	Y	RED	SQUARE	143
103	13	Y	GREEN	CIRCLE	123
103	13	Y	BLUE	SQUARE	142

I need the following result:

ID	CODE	IDENTIFIERS	COLORS1	COLORS2	COLORS3	SHAPES1	SHAPES2	SHAPES3	NUMB1	NUMB2	NUMB3
101	24	N	RED			CIRCLE			174
102	32	N	BLUE	PURPLE		SQUARE	STAR		155	223
103	13	Y	RED	GREEN	BLUE	SQUARE	CIRCLE	SQUARE	143	123	142

I tried:

    SELECT ID, 
     CODE, 
     IDENTIFIERS, 
     group_concat(COLORS) AS COLORSS, 
     group_concat(SHAPES) AS SHAPESS, 
     group_concat(NUMB) AS NUMBS 
     FROM ( 
     SELECT a.ID, 
     a.CODE, 
     a.IDENTIFIERS, 
     a.COLORS, 
     a.SHAPES, 
     a.NUMB 
     FROM database AS a 
    ) AS sub 
    GROUP BY ID, CODE, IDENTIFIERS

I got:

ID	CODE	IDENTIFIERS	COLORS	SHAPES	NUMB
101	24	N	RED	CIRCLE	174
102	32	N	BLUE,PURPLE	SQUARE,STAR	155,223
103	13	Y	RED,GREEN,BLUE	SQUARE,CIRCLE,SQUARE	143,123,142

Which is close, but not exactly what I was going for. Any help would be appreciated.

Will there always be a maximum of three input rows per id/code/identifiers combination? — MatBailie, Mar 23 '22 at 22:46
What you're asking for is like a [pivot](https://modern-sql.com/use-case/pivot) but because it's an for unordered lists of unknown sizes it's difficult to know how many columns there need to be, nor what goes into what column. You could have, for example, columns for `red`, `blue`, `green`, and `purple` with `true` or `false` as values. But just color1, color2, color3 is difficult. What are you going to do with this result? Maybe there's a better way. — Schwern, Mar 23 '22 at 22:48
MatBailie - No, there can be any number. I think my max is around 20 for colors, 30 for shapes, and maybe 25 for numbers. A unique ID could have 100 rows associated. This is a much bigger data set, it was just too much to type and I don't have enough reputation to post pictures. Also, the file for my actual data has thousands of rows. — anni, Mar 23 '22 at 22:58
SQL is both strongly and statically typed. That includes the number of columns in a dataset. If you need the number of columns to vary according to the source data, you're heading in a direction SQL intentionally doesn't support. In SQL the correct structure for such dynamically varying data is the one you're starting with. Which begs the question, why do you think you need to pivot the data? (this is a significant anti-pattern.) — MatBailie, Mar 23 '22 at 23:02
Schwern - I was using this resource: (https://stackoverflow.com/questions/7867397/sqlite-merging-rows-into-single-row-if-they-share-a-column) but I had trouble with the JOIN command because there were only 2 columns being affected in that example. The problem with using the true\false for the colors is that the colors are just a stand for a number representing a certain shade. There would probably be 100+ colors columns for each row if so. I'm going to use this result for further data analysis. — anni, Mar 23 '22 at 23:06
MatBailie - I want to have all the data corresponding to a particular ID so I can see how these ID's differ from one another. — anni, Mar 23 '22 at 23:08
You don't need everything on one row to analyse that. Rather than asking about pivoting, I suggest that you ask how to accomplish a specific analysis without needing to pivot it first. What you're describing so far is a ***very bad idea*** and will only make subsequent SQL harder, not easier. — MatBailie, Mar 23 '22 at 23:11
I was thinking I could use my current result (second table) and, for each column being affected, get a max count of the comma separated entries. Then I could append that number of columns to the table and fill them with the comma separated values. I just have no clue how to do this. — anni, Mar 23 '22 at 23:11
SQL doesn't support dynamically varying numbers of columns. You can't think of a way to add the number of columns you need, because SQL ***intentionally*** doesn't work like that. You're trying to hammer a square peg in to a round hole, using a screw driver. — MatBailie, Mar 23 '22 at 23:13
MatBaile - I'm new to SQL, can you help me with where this could go wrong? I'm hoping to switch to R after I get my database. I would have done this in R but my file is too big (~5gb). I'd just like to have all my data sorted and organized and take from it as I need it. I was recommended to use SQL because of the size. — anni, Mar 23 '22 at 23:16
MatBaile - The number of affected columns will not be larger than the number of rows of a unique ID. I would like to get this precise but could live with it being larger. Would it be possible to find this number and use it for all affected columns? — anni, Mar 23 '22 at 23:18
Learn how to use the data in its current form. If there is an analysis you need to do but can't work out how, ask that question. But don't try to pivot the data. SQL doesn't have the constructs to do what you want, because what you want makes everything subsequent harder. SQL is designed to use the structure you have already. Learn to use the language the way it's designed. https://xyproblem.info/ — MatBailie, Mar 23 '22 at 23:21
MatBailie - Thank you for the help. You went out of your way to help me and I appreciate that. Please excuse my ignorance on the issue, I'm new to SQL (probably obvious). — anni, Mar 23 '22 at 23:29

score 0 · Accepted Answer · answered Mar 23 '22 at 22:56

First you need a way to identify which columns you want to put a row into. For that you can use ROW_NUMBER()

Then you can use a combination of MAX(CASE) to pivot the data.

WITH
  sorted AS
(
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY id, code, identifiers)   AS row_id
  FROM
    yourTable
)
SELECT
  id,
  code,
  identifiers,

  MAX(CASE WHEN row_id = 1 THEN colors END)   AS colors_1,
  MAX(CASE WHEN row_id = 1 THEN shapes END)   AS shapes_1,
  MAX(CASE WHEN row_id = 1 THEN numb   END)   AS numb_1,

  MAX(CASE WHEN row_id = 2 THEN colors END)   AS colors_2,
  MAX(CASE WHEN row_id = 2 THEN shapes END)   AS shapes_2,
  MAX(CASE WHEN row_id = 2 THEN numb   END)   AS numb_2,

  MAX(CASE WHEN row_id = 3 THEN colors END)   AS colors_3,
  MAX(CASE WHEN row_id = 3 THEN shapes END)   AS shapes_3,
  MAX(CASE WHEN row_id = 3 THEN numb   END)   AS numb_3
FROM
  sorted
GROUP BY
  id,
  code,
  identifiers

How to move database table rows into columns based on a common entry

1 Answers1