0

I have MySql database with lots of records with date (timestamp) and few more attributes. As an example 'testTable' looks like


a varchar(255)
b int( 11)
timestamp bigint(20)

I need to find top 10 of sum(b) for each day for a period of time say Jan 1st to Jan 15th where those dates can be specified by user.

How will the iterative query look like ? Crude way could be individual select statements with UNION ALL in between.

select a, sum(b) from testTable where FROM_UNIXTIME( timestamp ) between '2012-01-01 05:10:00' and '2012-01-02 05:10:00' group by a order by sum(b) desc LIMIT 10
UNION ALL
select a, sum(b) from testTable where FROM_UNIXTIME( timestamp ) between '2012-01-02 05:10:00' and '2012-01-03 05:10:00' group by a  order by sum(b) desc LIMIT 10
UNION ALL
select a, sum(b) from testTable where FROM_UNIXTIME( timestamp ) between '2012-01-03 05:10:00' and '2012-01-04 05:10:00' group by a order by sum(b) desc LIMIT 10
..
..
..
UNION ALL
select a, sum(b) from testTable where FROM_UNIXTIME( timestamp ) between '2012-01-14 05:10:00' and '2012-01-15 05:10:00' group by a order by sum(b) desc LIMIT 10 ;</br>

But I want it to be more generic where user can run a script with 2 given dates.

output is like
a | FROM_UNIXTIME(timestamp) | sum (b)
-----------+------------------------+------
test | 2012-01-01 03:24:41-04 | 500
test | 2012-01-01 03:19:40-04 | 420
test | 2012-01-01 03:14:39-04 | 261
test | 2012-01-01 03:09:38-04 | 244
test | 2012-01-01 03:04:37-04 | 231
test | 2012-01-01 02:59:36-04 | 223
test | 2012-01-01 02:54:35-04 | 211
test1 | 2012-01-01 02:49:34-04 | 199
test1 | 2012-01-01 03:24:41-04 | 195
test1 | 2012-01-01 03:19:40-04 | 191
new | 2012-01-02 06:11:06-04 | 1000
new | 2012-01-02 06:06:06-04 | 978
new | 2012-01-02 06:01:06-04 | 867
new | 2012-01-02 05:56:05-04 | 786
new | 2012-01-02 05:51:05-04 | 698
new | 2012-01-02 05:46:05-04 | 598
new1 | 2012-01-02 06:11:06-04 | 476
new1 | 2012-01-02 05:41:04-04 | 345
new2 | 2012-01-02 06:06:06-04 | 250
new2 | 2012-01-02 06:01:06-04 | 125

S__R
  • 31
  • 2
  • 5
  • select a, sum(b) from testTable where FROM_UNIXTIME( a ) between '2012-01-01 05:10:00' and '2012-01-02 05:10:00' group by a order by sum(b) desc LIMIT 10 UNION ALL select a, sum(b) from testTable where FROM_UNIXTIME( a ) between '2012-01-03 05:10:00' and '2012-01-04 05:10:00' group by a order by sum(b) desc LIMIT 10 .. .. .. UNION ALL select a, sum(b) from testTable where FROM_UNIXTIME( a ) between '2012-01-14 05:10:00' and '2012-01-15 05:10:00' group by a order by sum(b) desc LIMIT 10 ; – S__R Jul 16 '12 at 15:30
  • Realized that after adding the comments. I have updated the query.Thanks – S__R Jul 16 '12 at 16:05
  • What does the timestamp refer to in that sample resultset, since you're grouping on `a`? Timestamps are typically for individual rows and wouldn't be the same within a whole group. – J. Miller Jul 16 '12 at 16:59
  • timestamp is stored as bigint in database upto the seconds granularity – S__R Jul 16 '12 at 17:24
  • Your question with the union portions and your output don't match up. I posted an answer based on your previous revision of this question. – Tony Jul 16 '12 at 19:23
  • Proper [sample code](http://sscce.org/) (here, [SQL statements](http://weblogs.sqlteam.com/jeffs/archive/2008/05/13/question-needed-not-answer.aspx)) is more useful than any ad hoc schema and sample data format. Please use complete `CREATE TABLE` and `INSERT ... VALUES` for samples. Desired results don't need to be presented as sample code, as results are the output of code and not code themselves. – outis Jul 19 '12 at 10:09
  • Note you can [format lines as code](http://meta.stackexchange.com/questions/22186/) by indenting them four spaces. The "{}" button in the editor toolbar does this for you. Edit your question and try it out. Click the orange question mark in the editor toolbar for more information and tips on formatting. Until SO supports [``](http://meta.stackexchange.com/q/42692/133817), sample output is best marked up with `
    `.
    – outis Jul 19 '12 at 10:10
  • possible duplicate of [mysql limit inside group?](http://stackoverflow.com/q/1153738/), [How do I select multiple items from each group in a mysql query?](http://stackoverflow.com/q/1304599/), and ***MANY*** [others](http://stackoverflow.com/questions/tagged/greatest-n-per-group) (and [yet more](http://stackoverflow.com/search?q=%5Bsql%5D+%2Btop+%2Bn+%2Bgroup&submit=search)). – outis Jul 20 '12 at 07:11

1 Answers1

1

Try this... Change the dates in the between range to pass the range only once.

Corrected for typo, omitted line, and missing comma:

select day, a, tot
from 
   (
   select 
      *,
      @num := if(@day =  tt4.day, @num + 1, 1) as row_number,
      @day := tt4.day as dummy
   from
      (
      select
         ts as day, 
         tt1.a, 
         sum(tt1.b) as tot
      from 
         testTable tt1, 
         ( select distinct date(FROM_UNIXTIME(tt2.timestamp)) as ts
           from   testTable tt2
           where  date(FROM_UNIXTIME(tt2.timestamp)) between cast('2012/01/01' as date) and cast('2012/01/15' as date) ) as tt3
      where 
         date(FROM_UNIXTIME(tt1.timestamp)) = tt3.ts
      group by 
         date(FROM_UNIXTIME(tt1.timestamp)), 
         tt1.a
      order by 
         date(FROM_UNIXTIME(tt1.timestamp)),
         sum(tt1.b) desc,
         tt1.a
      ) as tt4
   ) as tt5
where 
   tt5.row_number <=10

Modified - flavor of SQL changed for Vertica... syntax may be off (I don't have a Vertica installation to test against), but the gist is there.

select day, a, tot
from 
   (
   select 
      *,
      ROW_NUMBER() OVER (PARTITION BY tt4.day) as row_number
   from
      (
      select
         ts as day, 
         tt1.a, 
         sum(tt1.b) as tot
      from 
         testTable tt1, 
         ( select distinct date(TO_TIMESTAMP(tt2.timestamp)) as ts
           from   testTable tt2
           where  date(TO_TIMESTAMP(tt2.timestamp)) between cast('2012/01/01' as date) and cast('2012/01/15' as date) ) as tt3
      where 
         date(TO_TIMESTAMP(tt1.timestamp)) = tt3.ts
      group by 
         date(TO_TIMESTAMP(tt1.timestamp)), 
         tt1.a
      order by 
         date(TO_TIMESTAMP(tt1.timestamp)),
         sum(tt1.b) desc,
         tt1.a
      ) as tt4
   ) as tt5
where 
   tt5.row_number <=10
Tony
  • 2,658
  • 2
  • 31
  • 46
  • This is a variation of the Top N Per Group scenario posted elsewhere... Check out http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ for more details. – Tony Jul 16 '12 at 19:26
  • Tony, I have all sorts of errors when I try to execute the query that you have. – S__R Jul 17 '12 at 12:08
  • Sorry, I posted an interim iteration I was working with. I corrected the code above. – Tony Jul 17 '12 at 21:57
  • Thanks Tony. Any idea to convert this to Vertica vSQL ? Looks like the variable declaration is different in vSQL. – S__R Jul 19 '12 at 12:06
  • Sorry, no... you said MySQL in the question. Is there an online reference for vSQL syntax? I couldn't find one. – Tony Jul 19 '12 at 14:33
  • Looks like Vertica supports row_count()... that makes it pretty easy. Change the variable declarations to "ROW_NUMBER() OVER (PARTITION BY tt4.day) as row_number" – Tony Jul 25 '12 at 14:17
  • Did my revision for Vertica syntax help you? – Tony Sep 25 '12 at 18:57