2

I'm just starting to learn MySQL and I have encountered this problem which I badly needed the solution or just the logic.

For example I have this table:

id         a 
--        --
 1         1
 2         2
 3         3
 4         5
 5         6
 6         7
 7         9
 8        10
 9        11
10        12

Now, what I want is to display all the data in column a which should be grouped by series. In this case the result should be:

series_start|series_end|count
------------+----------+-----
           1          3     3
           5          7     3
           9         12     4

This needs a lot of subqueries and joins. I just can't figure it out by now.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
  • 1
    Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. – Kermit Oct 11 '13 at 13:36
  • It needs several JOINs, but no subqueries ;-) – Strawberry Oct 11 '13 at 14:15
  • SELECT a.a start , MIN(c.a) end , MIN(c.a) - a.a + 1 diff FROM my_table a LEFT JOIN my_table b ON a.a = b.a + 1 LEFT JOIN my_table c ON c.a >= a.a LEFT JOIN my_table d ON d.a = c.a+1 WHERE b.a IS NULL AND c.a IS NOT NULL AND d.a IS NULL GROUP BY a.a; – Strawberry Oct 11 '13 at 14:24

2 Answers2

2

This is a problem, and here's another method of solving it, which also uses variables:

SELECT
  MIN(a) AS series_start,
  MAX(a) AS series_end,
  MAX(a) - MIN(a) + 1 AS series_count
FROM (
  SELECT
    a,
    @r := @r + 1 AS r
  FROM
    yourtable,
    (SELECT @r := 0) AS x
  ORDER BY
    a
) s
GROUP BY
  a - r
ORDER BY
  a - r
;

This is how it works.

The subquery assigns row numbers to the table rows and returns this row set:

 a   r
--  --
 1   1
 2   2
 3   3
 5   4
 6   5
 7   6
 9   7
10   8
11   9
12  10

In this case the r column, which stores the row numbers, happens to match the id column in your data sample, but I'm assuming that in general the id column may have gaps, and for that reason it cannot be used here.

The main query groups the results by the difference between r and a: for sequential values, it will always be the same:

 a   r  a - r
--  --  -----
 1   1      0
 2   2      0
 3   3      0
 5   4      1
 6   5      1
 7   6      1
 9   7      2
10   8      2
11   9      2
12  10      2

and that allows us to group such rows together. All that remains at this point is to get the minimim, maximum and count, which gives you this output:

series_start  series_end  series_count
------------  ----------  ------------
           1           3             3
           5           7             3
           9          12             4

A SQL Fiddle demonstration of this query, for which I've borrowed @sgeddes's schema, can be found here.


UPDATE

As numeric variables cannot be used (according to comments), you could assign row numbers using a triangular self-join, but it will be much less efficient than using a variable. Anyway, here's the modified version, changes to the previous query being highlighted in bold:

SELECT
  MIN(a) AS series_start,
  MAX(a) AS series_end,
  MAX(a) - MIN(a) + 1 AS series_count
FROM (
  SELECT
    data.a,
    COUNT(*) AS r
  FROM
    yourtable AS data
  INNER JOIN
    yourtable AS tally
  ON
    data.id >= tally.id
  GROUP BY
    data.a
) s
GROUP BY
  a - r
ORDER BY
  a - r
;

The approach itself remains unchanged: the subquery returns a ranked row set, which is then processed same as previously.

A SQL Fiddle demo for the modified query is available here.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
  • O.O I can't use the method above. Our company is using a CGI which treats any word that stars with '@' as a variable and is treated as a normal string which will be concatinated to the statement itself. Without an initial value, it will be left as blank causing an error. – splakard nanards Oct 14 '13 at 03:26
  • The `@r` is indeed a variable in this context, but I don't really understand what you are saying about how your CGI treats a word starting with a `@`. Does it treat it as a variable or as a string, after all? Anyway, if it's only the `(SELECT @r := 0)` subquery that is the issue, you could try removing it and replacing the `@r := @r + 1 AS r` with `@r := IFNULL(@r, 0) + 1 AS r`. – Andriy M Oct 14 '13 at 05:41
  • Any word that starts with an '@' is treated as a variable(of string type). I could not use also the := to assign values, it causes an error. – splakard nanards Oct 16 '13 at 08:30
  • @Tuyhakaw: Ah, I see now. Please see my update for an alternative version. – Andriy M Oct 16 '13 at 11:58
0

Here's one solution using user defined variables:

select min(series_start) series_start, 
  max(series_end) series_end,
  1 + max(series_end) - min(series_start) count
from (
  select t1.a series_start, 
    t2.a series_end,
    @val:=IF(@prev=t2.a-1,@val,@val+1) val,
    @prev:=t2.a
  from yourtable t1
    join yourtable t2 on t1.a = t2.a-1
    join (select @val:= 0, @prev:= 0) t
  order by t2.a
  ) t
group by val
sgeddes
  • 62,311
  • 6
  • 61
  • 83
  • It's working fine, but I need to read more to understand fully this user defined variables method. Thanks for the answer. :) – splakard nanards Oct 10 '13 at 01:54
  • @Tuyhakaw -- no worries, glad I could help! Basically the `val` variable keeps track of sequential records. Grouping by that variable, you can use aggregate functions to achieve your results. Best regards. – sgeddes Oct 10 '13 at 02:01
  • By the way I have additional question: What does a simple `join` do to tables? I know about `left`,`right` and other `joins` but not this one. – splakard nanards Oct 10 '13 at 02:36
  • @Tuyhakaw -- using `join` by itself is like saying `inner join` -- it's just a shorthand. Here's a good article on the different types of joins (inner vs. outer): http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html – sgeddes Oct 10 '13 at 02:48
  • O.O I can't use the method above. Our company is using a CGI which treats any word that stars with '@' as a variable and is treated as a normal string which will be concatinated to the statement itself. Without an initial value, it will be left as blank causing an error. – splakard nanards Oct 14 '13 at 03:25