Normalizing table data in mySQL procedurally

Question

I have a large data set in a denormalized format. Here is an example of the column names:

foreign_key_ID, P1, P2, P3, P4, P5.... D1, D2, D3.... etc..

These fields all contain similar type of data.

I need to normalize this into my existing table structure:

insert into new_table (id, name, index)
select foreign_key_id, P1, 1
from denormalized_table;

But that means that I need to run separate queries for each field in my denormalized table, just changing a few things:

insert into new_table (id, name, index)
select foreign_key_id, P2, 2
from denormalized_table;

This is getting tedious considering how many of these fields I have.

Is there a way this can be automated into a single operation? I.e.: iterate through the fields (I don't mind creating a list of eligible fields once, somewhere), pull off the last digit of that field name (ie "1" in "P1" and "2" for "P2") use the field name and the extracted index # in the sub-select.

score 1 · Answer 1 · edited May 23 '17 at 11:48

hat you're looking for is Dynamic SQL. This is where you execute SQL statements that you can assemble programmatically. You can run any arbitrary SQL code that's in a string, as long as you're in a Stored Procedure. See this link: How To have Dynamic SQL in MySQL Stored Procedure

Basically, you can build a string using mySQL statements by iterating over a set of columns. You can use the SHOW COLUMNS syntax (see http://dev.mysql.com/doc/refman/5.0/en/show-columns.html) to return a collection then loop over that resultset and build your dynamic query string and execute that way.

SHOW COLUMNS FROM myTable WHERE Field NOT IN (pkey, otherFieldIDontWantToInclude)

This is really interesting stuff, and nothing I've ever tried before. In combination with @spencer7593's response, this might be the trick I'm looking for. Thanks for the links. — Tom Auger, Feb 11 '13 at 15:33

spencer7593 · Accepted Answer · 2013-02-04T19:37:22.137

Here's a start:

SELECT column_name, substr(column_name,2) AS `index`
  FROM information_schema.columns
 WHERE table_schema = 'mydatabasename'
   AND table_name = 'denormalized_table'
   AND column_name REGEXP '^[PD][0-9]+$'
 ORDER BY column_name

You can modify the select list in that statement, to have MySQL generate statements for you:

SELECT CONCAT('INSERT INTO new_table (id, name, `index`) SELECT foreign_key_id, '
         ,column_name,', ',substr(column_name,2)
         ,' FROM denormalized_table ;') AS stmt
  FROM information_schema.columns
 WHERE table_schema = 'mydatabasename'
   AND table_name = 'denormalized_table'
   AND column_name REGEXP '^[PD][0-9]+$'
 ORDER BY column_name

The output from that would be a set of MySQL INSERT statements that you could then execute.

If the number of rows and total size of the data to be inserted is not too large, you could and you want to get the whole conversion done in "one operation", then you could generate a single INSERT INTO ... SELECT statement, using the UNION ALL operator. I would get the majority of the statement like this:

SELECT CONCAT('UNION ALL SELECT foreign_key_id, '
         ,column_name,', ',substr(column_name,2)
         ,' FROM denormalized_table ') AS stmt
  FROM information_schema.columns
 WHERE table_schema = 'mydatabasename'
   AND table_name = 'denormalized_table'
   AND column_name REGEXP '^[PD][0-9]+$'
 ORDER BY column_name

I would take the output from that, and replace the very first UNION ALL with an INSERT INTO .... That would give me a single statement to run to get the whole conversion done.

Interesting. So you're suggesting using SQL to generate the SQL that you would then copy and paste into a new query to actually execute the inserts? — Tom Auger, Feb 04 '13 at 19:44
No, the results of his query will generate a SQL statement per row, and you can loop over that result-set and use EXECUTE on the returned rows. See my post about Dynamic SQL. — Richthofen, Feb 04 '13 at 19:50
@Tom Auger: yes, for an admin function like a one time import of data, I will use SQL and the information_schema database to help me generate SQL statements. (I'd rather do that, than do a bunch of tedious editing.) — spencer7593, Feb 04 '13 at 21:22

Normalizing table data in mySQL procedurally

2 Answers2