4

I have a table to data that I want to export to a CSV. Ideally, I'd like to switch the rows and columns around, so that the data is grouped a little better.

To explain further, currently, the database looks like this..

data_id     data_timestamp         data_value    
--------------------------------------------
1           2011-07-07 00:01:00    0.400  
1           2011-07-07 00:02:00    0.500
1           2011-07-07 00:03:00    0.600
1           2011-07-07 00:04:00    0.700
2           2011-07-07 00:01:00    0.100  
2           2011-07-07 00:02:00    0.200
2           2011-07-07 00:03:00    0.250
2           2011-07-07 00:04:00    2.300

What I'd like to do group the data_value by the data_timestamp value, so that the timestamps are grouped, and each data_value for each data_id is shown in a column, instead of a row.

data_timestamp         input_1    input_2     
--------------------------------------------
2011-07-07 00:01:00    0.400      0.100  
2011-07-07 00:02:00    0.500      0.200
2011-07-07 00:03:00    0.600      0.250
2011-07-07 00:04:00    0.700      2.300

Below is the query i'm using...

SELECT d.data_timestamp, d.input_1, d.input_2
FROM (
    SELECT data_timestamp,
    IF(data_id=1,data_value,NULL) AS 'input_1',
    IF(data_id=2,data_value,NULL) AS 'input_2' FROM data
) AS d ORDER BY data_timestamp ASC

But it's not quite what i'm wanting, as there are now NULL values whenever one data_id doesn't have a value. GROUP BY seems to group the data_value's as well, which isn't what I want.

Any suggestions?

EDIT:

I've already tried using WHERE d.input_1 IS NOT NULL in the outer query, but can't quite get the results..

Before the WHERE...

data_timestamp         input_1    input_2     
--------------------------------------------
2011-07-07 00:01:00    0.400      NULL
2011-07-07 00:01:00    NULL       0.100  
2011-07-07 00:02:00    0.500      NULL
2011-07-07 00:02:00    NULL       0.200
2011-07-07 00:03:00    0.600      NULL
2011-07-07 00:03:00    NULL       0.250
2011-07-07 00:04:00    0.700      NULL
2011-07-07 00:04:00    NULL       2.300

Adding WHERE d.input_1 IS NOT NULL will drop the input_2 values..

data_timestamp         input_1    input_2     
--------------------------------------------
2011-07-07 00:01:00    0.400      NULL
2011-07-07 00:02:00    0.500      NULL
2011-07-07 00:03:00    0.600      NULL
2011-07-07 00:04:00    0.700      NULL

Also, in reality, I have about 20 id's to group by, so wouldn't be the best of ideas to OR all of them either..

crawf
  • 9,448
  • 10
  • 33
  • 43

2 Answers2

10

PIVOTing is neither easy (nor nice) to do. I prefer using CASE:

SELECT d.data_timestamp
     , SUM( CASE WHEN data_id =  1 THEN data_value ELSE 0 END ) AS 'input_1'
     , SUM( CASE WHEN data_id =  2 THEN data_value ELSE 0 END ) AS 'input_2'
     ...
     , SUM( CASE WHEN data_id = 20 THEN data_value ELSE 0 END ) AS 'input_20'
FROM data 
GROUP BY data_timestamp
ORDER BY data_timestamp ASC

but IF works in MySQL as well:

SELECT d.data_timestamp
     , SUM( IF(data_id =  1, data_value, 0) ) AS 'input_1'
     , SUM( IF(data_id =  2, data_value, 0) ) AS 'input_2'
     ...
     , SUM( IF(data_id = 20, data_value, 0) ) AS 'input_20'
FROM data 
GROUP BY data_timestamp
ORDER BY data_timestamp ASC

Alternatively, you can use a 20-level JOIN:

SELECT d.data_timestamp
     , d01.data_value AS 'input_1'
     , d02.data_value AS 'input_2'
     ...
     , d20.data_value AS 'input_20'
FROM
  ( SELECT DISTINCT d.data_timestamp
    FROM data
  ) AS d 
  LEFT JOIN data AS d01
    ON d01.data_timestamp = d.data_timestamp
    AND d01.data_id =  1
  LEFT JOIN data AS d02
    ON d02.data_timestamp = d.data_timestamp
    AND d02.data_id =  2
  ...                                   --- 20 JOINs
  LEFT JOIN data AS d20
    ON d20.data_timestamp = d.data_timestamp
    AND d20.data_id = 20
ORDER BY d.data_timestamp ASC
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • I'd prefer not to use a 20-level join (it makes me nervous), but you CASE (and IF) is actually what i'd used a long time ago (which since forgot). What got me was the use of SUM, which actually doesn't SUM all the grouped results. Thanks so much! – crawf Jul 07 '11 at 05:59
  • 1
    @crawf: You can test though the 20-level join. It may be faster :) – ypercubeᵀᴹ Jul 08 '11 at 05:30
  • after a few test, its slighty faster than the IF solution. I'm surprised, I didn't think joining that many times would be any quicker. Thanks again! – crawf Jul 08 '11 at 05:59
  • @crawf: Welcome, test with big tables, like 100K rows and more to be sure, (and keep the faster one). If there is chance you'll get to use it with so many data. – ypercubeᵀᴹ Jul 08 '11 at 06:06
5

Just join the table to itself!

SELECT dt1.data_timestamp, dt1.input_1, dt2.input_2
FROM data_timestamp dt1
JOIN data_timestamp dt2 
    on dt1.data_timestamp = dt2.data_timestamp 
    and dt2.input_1 is null
WHERE dt1.input_2 is null;

Note that this query assumes input_2's value are present for every input_1 value. If that's not that case, use LEFT JOIN or CROSS JOIN etc

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • 1
    if I could accept two answers, I would! That's why I voted it up..the problem is that I have 20 data ids to join on. With yor solution, I'd need to have 20 different joins, which I'm not fond of (and not sure how that would go down performance-wise). I didnt mean to offend by not accepting, it'd be perfect for a table of 2 data ids! – crawf Jul 07 '11 at 21:31