1

New to MySQL and trying to carry something over from R.

I have a data table with two columns, similar to the following, with a level-2 id and a nested id:

level2id | nestedid |
1        | 1        |
1        | 2        |
1        | 3        |
2        | 1        |
2        | 2        |
...

I want to restructure the data using MYSql in a new table like this:

level2id | nestedid1 | nestedid2 | nestedid3 |
1        | 1         | 2         | 3         |
2        | 1         | 2         |           |
...

This is so that I can later perform joins to extract information on the nested ids to create aggregate values for variables that relate to the level2 id. Trivial to do in R using reshape for "time-varying" data, but can't find an obvious solution to this particular format (i.e. where the data are not organized in terms of attribute names and attribute values in columns. Thanks in advance!

SMM
  • 193
  • 1
  • 3
  • 12
  • `I want to restructure the data using MYSql in a new table like this:` --- it is a very bad idea. What is the original reason to do so? – zerkms Mar 02 '12 at 03:15
  • 2
    I don't think this is easily possible in MySQL - if you're already using R I'd suggest doing reshaping in R. You could try the [`sqldf`](http://code.google.com/p/sqldf/) package to perform SQL-like queries on data frames. – mathematical.coffee Mar 02 '12 at 03:17
  • There are lots of reasons to do this kind of thing. In my case, I need to collect information on individuals (nestedid) and aggregate that information at the household level (level2). I don't just want a bunch of cross-tabs, though, because the specific relationships of the nestedids to each other are important... – SMM Mar 02 '12 at 03:31

3 Answers3

0

I had a similar issue. Maybe you want to have a look at dynamic pivoting in sql. Dynamic pivot table with multiple columns in sql server. However I really do not recommend if you can just use a reshape command in R.

Community
  • 1
  • 1
0

You may use MySQL to create a MySQL program that will fix this:

USE test;

/*Create long input table 'test' with variables of varying length*/
DROP TABLE nums;
CREATE TABLE nums (id INT(2));
INSERT INTO nums
VALUES 
(0), (1), (2), (3), (4), (5), (6), (7);

DROP TABLE test;
CREATE TABLE test (id INT(2), var VARCHAR(5), attribute VARCHAR(6), PRIMARY KEY (id, var));
INSERT INTO test
SELECT nums3.*, REPEAT(CHAR(97+RAND()*24),CAST(6.*RAND() AS INT)) AS attribute
 FROM (SELECT DISTINCT nums2.id1 as id, CONCAT('var', LPAD(CAST(16.*RAND() AS INT),2,'0')) AS var  
 FROM (SELECT DISTINCT nums.id as id1, nums1.id as id2 FROM nums, nums as nums1) AS nums2) AS nums3;

/*Create SQL program to convert long to wide format (R: reshape)*/
SELECT DISTINCT CONCAT('DROP TABLE result;\nCREATE TABLE result (id INT(2),
', GROUP_CONCAT(CONCAT(field) SEPARATOR ', '), ');')
FROM 
(SELECT DISTINCT CONCAT(var, CONCAT(' VARCHAR(', max(length(attribute)), ')')) AS field
FROM test GROUP BY var) AS fields

UNION

SELECT CONCAT("INSERT INTO result \nSELECT DISTINCT test.id, ", GROUP_CONCAT(var SEPARATOR '.attribute, '),
".attribute FROM (SELECT DISTINCT id FROM test) AS test") 
FROM (SELECT DISTINCT var FROM test ORDER BY var) as vars

UNION

SELECT CONCAT("LEFT JOIN test AS ", var, " ON test.id = ", var, ".id AND ", var, ".var=", '"', var, '"' )
FROM (SELECT DISTINCT var FROM test ORDER BY var) as vars

UNION

SELECT ";" ;

/*Copy output to screen editor, delete '|' symbols and superfluous white spaces.
Then copy to MySQL prompt, run by pressing 'enter' key and view 'result'*/
Turdus
  • 1
  • 1
0

While you cannot do it as a SELECT you may achieve this using inserts which only works if the primary key ls level2id or you have a unique index on that level2id

Table structure

CREATE TABLE `table2` (
  `level2id` int(11) NOT NULL DEFAULT '0',
  `nestedid1` int(11) NOT NULL,
  `nestedid2` int(11) NOT NULL,
  `nestedid3` int(11) NOT NULL,
  PRIMARY KEY (`level2id`)
) ENGINE=InnoDB;

The insert SQL statements replace table1 with your old table

INSERT INTO table2 (level2id, nestedid1) SELECT level2id, nestedid FROM table1 WHERE nestedid = 1 ON DUPLICATE KEY UPDATE nestedid1 = nestedid;
INSERT INTO table2 (level2id, nestedid2) SELECT level2id, nestedid FROM table1 WHERE nestedid = 2 ON DUPLICATE KEY UPDATE nestedid2 = nestedid;
INSERT INTO table2 (level2id, nestedid3) SELECT level2id, nestedid FROM table1 WHERE nestedid = 3 ON DUPLICATE KEY UPDATE nestedid3 = nestedid;

The ON DUPLICATE KEY UPDATE is a MySQL extension more details here http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

Stephen Senkomago Musoke
  • 3,528
  • 2
  • 29
  • 27