1

I have a requirement where I need o group data into equal number ob rows. As mysql doesn't have rownum() I'm simulating this behaviour:

SET @row:=6; 
SELECT MAX(agg.timestamp) AS timestamp, MAX(agg.value) AS value, COUNT(agg.value) AS count 
FROM 
    (
        SELECT timestamp, value, @row:=@row+1 AS row 
        FROM data 
        WHERE channel_id=52 AND timestamp >= 0 ORDER BY timestamp
    ) AS agg 
GROUP BY row div 8
ORDER BY timestamp ASC;

Note: according to Can grouped expressions be used with variable assignments? this query may not be 100% correct, but it does work.

An additional requirement is to calculate the row difference between the grouped sets. I've looked for a solution joining the same table with a subquery:

SET @row:=6; 
SELECT MAX(agg.timestamp) AS timestamp, MAX(agg.value) AS value, COUNT(agg.value) AS count 
FROM 
    (
        SELECT timestamp, value, @row:=@row+1 AS row 
        FROM data 
        WHERE channel_id=52 AND timestamp >= 0 ORDER BY timestamp
    ) AS agg 
LEFT JOIN data AS prev
    ON prev.channel_id = agg.channel_id
    AND prev.timestamp = (
        SELECT MAX(timestamp) 
        FROM data
        WHERE data.channel_id = agg.channel_id
            AND data.timestamp < MIN(agg.timestamp)
    )
GROUP BY row div 8
ORDER BY timestamp ASC;

Unfortunately that errors:

Error Code: 1054. Unknown column 'agg.channel_id' in 'on clause'

Any idea how this query could be written?

Community
  • 1
  • 1
andig
  • 13,378
  • 13
  • 61
  • 98
  • It should just be a matter of exposing `channel_id` in the select list for the `agg` subquery. Currentl you only have it in the `WHERE` clause. (I haven't analyzed whether that will give you the desired result, but it will clear the 1054 error) – Michael Berkowski Jun 09 '14 at 19:50

2 Answers2

2

You never selected channel_id from your sbuquery, so it's not returned to the parent query, and is therefore invisible. Try

SELECT MAX(agg.timestamp) AS timestamp, MAX(agg.value) AS value, COUNT(agg.value) AS count 
FROM 
    (
        SELECT timestamp, value, @row:=@row+1 AS row, channel_id
                                                    ^^^^^^^^^^^^-- need this
        FROM data 

Since MySQL only sees and uses the fields you explicitly return from that subquery, and will NOT "dig deeper" into the table underlying the query, you need to select/return all of the fields you'll be using the parent queries.

Marc B
  • 356,200
  • 43
  • 426
  • 500
0

How about this version:

SELECT MAX(agg.timestamp) AS timestamp, MAX(agg.value) AS value, COUNT(agg.value) AS count, COALESCE(prev.timestamp, 0) AS prev_timestamp
FROM (SELECT d.*, @row:=@row+1 AS row 
      FROM data d CROSS JOIN
           (select @row := 6) vars
      WHERE channel_id = 52 AND timestamp >= 0 ORDER BY timestamp
     )  agg LEFT JOIN
    data prev
    ON prev.channel_id = agg.channel_id AND
       prev.timestamp = (SELECT MAX(timestamp) 
                         FROM data
                         WHERE data.channel_id = agg.channel_id AND
                               data.timestamp < agg.timestamp
                        )
GROUP BY row div 8
ORDER BY timestamp ASC;

This includes all the columns in the subquery. And it puts the variable initialization in the same query.

andig
  • 13,378
  • 13
  • 61
  • 98
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I like the cross join, but "Error Code: 1111. Invalid use of group function" – andig Jun 09 '14 at 20:20
  • @andig . . . That is in your subquery. I'm not sure what that is supposed to do, but you can't put a correlated subquery in a `from` clause. – Gordon Linoff Jun 09 '14 at 20:35
  • I need the subquery to calculate the timestamp row by row difference to the previous, aggregated row. I've rewritten your answer to eliminate the error and deliver the desired result. – andig Jun 11 '14 at 09:49
  • just tried this on my production database. On a table with only 40k records and channel_id/timestamp index event the explain plan won't return. Seems the subquery is severely killing performance here? – andig Jun 25 '14 at 15:58
  • The explain plan not returning is a different issue. The performance of the query should be reasonable. That index should work for both subqueries and the amount of data is not enormously large. – Gordon Linoff Jun 25 '14 at 21:09
  • I've put the performance question here: http://stackoverflow.com/questions/24457442/how-to-find-previous-record-n-per-group-maxtimestamp-timestamp – andig Jun 27 '14 at 17:16