0

I have a query that pulls some aggregate stats by age group

    Agegroup    Freq
    0-5         2.3
    6-10        3.2
    11-15       3.6

For various reasons, I need the output data to be a lookup table for every age 1-100 of the following format

    Age Agegroup    Freq
    1   0-5         2.3
    2   0-5         2.3
    3   0-5         2.3
    4   0-5         2.3
    5   0-5         2.3
    6   6-10        3.2
    7   6-10        3.2
    8   6-10        3.2
    9   6-10        3.2
    10  6-10        3.2
...

How could I go about doing this? I'm not able to create tables, so I'm thinking if there's a way to write some kind of select statement that will have all ages 1-100 and the agegroup and then join it to the original query which has the calculated frequencies by agegroup - something like this

SELECT t1.age, [case when statemenet that assigns the correct age group from t1.Age] "Agegroup"

FROM ([statemement that generates numbers 1-100] "age") t1

JOIN (Original query that creates the aggreated agegroup data) t2 on t1.Agegroup = t2.Agegroup

So I have two questions

  1. Is this an approach that makes sense at all?
  2. Is it possible to generate the t1 I'm looking for? I.e. a select statement that will create a t1 of the form

    Age Agegroup
    1   0-5
    2   0-5
    3   0-5
    4   0-5  
    5   0-5 
    6   6-10
    7   6-10
    8   6-10
    9   6-10
    10  6-10 
    

    ...

that could then be joined with the query that has the frequency by agegroup?

L Xandor
  • 1,659
  • 4
  • 24
  • 48
  • 2
    This will generate numbers 1-100: `select rownum as n from dual connect by level <= 100;` – Tony Andrews Aug 08 '16 at 17:45
  • 3
    Possible duplicate of [SQL to generate a list of numbers from 1 to 100](http://stackoverflow.com/questions/2847226/sql-to-generate-a-list-of-numbers-from-1-to-100) – sstan Aug 08 '16 at 17:58
  • Your example output doesn't have age 0, is that intentional? Are all the ranges 5 years (except the first one, if zero is included)? And is it possible to modify the original query instead - maybe this could be done in one go with analytic functions, for instance. – Alex Poole Aug 08 '16 at 18:06
  • @TonyAndrews thanks, that's exactly what I was looking for. – L Xandor Aug 10 '16 at 19:31

4 Answers4

1

Something like this... I included age 0 (it can be excluded if need be), and I only went through age 15. That is hard-coded; with a little extra work, it can be made to match the highest age in the ranges.

This version does unnecessary work, because it computes the substrings repeatedly. It may still execute in less than a second, but if performance becomes important, it can be written to compute those substrings in a CTE first, so they are not computed repeatedly. (Not shown here.)

with
     inputs (agegroup, freq ) as (
       select '0-5',   2.3 from dual union all
       select '6-10',  3.2 from dual union all
       select '11-15', 3.6 from dual
     )
select c.age, i.agegroup, i.freq
from   (select level - 1 as age from dual connect by level <= 16) c
       inner join inputs i
       on age between to_number(substr(i.agegroup, 1, instr(i.agegroup, '-') - 1))
              and     to_number(substr(i.agegroup, instr(i.agegroup, '-') + 1))
order by age
;

Output:

 AGE AGEGROUP        FREQ
---- --------- ----------
   0 0-5              2.3
   1 0-5              2.3
   2 0-5              2.3
   3 0-5              2.3
   4 0-5              2.3
   5 0-5              2.3
   6 6-10             3.2
   7 6-10             3.2
   8 6-10             3.2
   9 6-10             3.2
  10 6-10             3.2
  11 11-15            3.6
  12 11-15            3.6
  13 11-15            3.6
  14 11-15            3.6
  15 11-15            3.6

16 rows selected.
1

Here is a different solution, using a hierarchical query. It doesn't need "magic numbers" anymore, the ages are logically determined by the ranges, and there's no join (other than whatever the query engine does behind the scenes in the hierarchical query). On the admittedly very small sample you provided, the optimizer cost is about 20% less than the join-based solution I provided - that may result in slightly faster execution.

(NOTE - I posted two different solutions so I believe these are separate Answers - as opposed to editing my earlier post. I wasn't sure which action is appropriate.)

Also another note to acknowledge that @AlexPoole mentioned this approach in his post; I didn't see it till now, or I would have acknowledged it from the outset.

with
     inputs (agegroup, freq ) as (
       select '0-5',   2.3 from dual union all
       select '6-10',  3.2 from dual union all
       select '11-15', 3.6 from dual
     )
select  to_number(substr(agegroup, 1, instr(agegroup, '-') - 1)) + level - 1 as age,
        agegroup, freq
from    inputs
connect by  level <= 1 + to_number(substr(agegroup, instr(agegroup, '-') + 1)) - 
                         to_number(substr(agegroup, 1, instr(agegroup, '-') - 1))  
        and prior agegroup = agegroup
        and prior sys_guid() is not null
order by age
;
  • That is what I was referring to, but very vaguely *8-) I tend to prefer recursive CTEs over forcing the non-detrmistic prior clause now, but this might be better on such a small data set. – Alex Poole Aug 08 '16 at 21:19
  • 1
    @AlexPoole - I do prefer recursive CTEs too, in general; some (not entirely formal) testing I've done with splitting large CSV's seemed to suggest the recursive query is considerably faster than hierarchical queries. One benefit of hierarchical queries it that they can be used with pre-11.2 versions. –  Aug 08 '16 at 21:26
  • Yes, meant to mention that too. – Alex Poole Aug 08 '16 at 21:28
0

An alternative approach, if you're on 11gR2 or higher, is to use recursive subquery factoring with a regular expression to extract the lower and upper age in each range from your string value;

with original_query (agegroup, freq) as (
  -- Original query that creates the aggreated agegroup data
  select '0-5', 2.3 from dual
  union all select '6-10', 3.2 from dual
  union all select '11-15', 3.6 from dual
),
r (age, agegroup, freq) as (
  select to_number(regexp_substr(agegroup, '\d+', 1, 1)), agegroup, freq
  from original_query
  union all
  select age + 1, agegroup, freq
  from r
  where age < to_number(regexp_substr(agegroup, '\d+', 1, 2))
)
select age, agegroup, freq
from r
order by age;

       AGE AGEGR       FREQ
---------- ----- ----------
         0 0-5          2.3
         1 0-5          2.3
         2 0-5          2.3
         3 0-5          2.3
         4 0-5          2.3
         5 0-5          2.3
         6 6-10         3.2
         7 6-10         3.2
         8 6-10         3.2
         9 6-10         3.2
        10 6-10         3.2
        11 11-15        3.6
        12 11-15        3.6
        13 11-15        3.6
        14 11-15        3.6
        15 11-15        3.6

The anchor member gets each original row from your existing result set, and extracts the lower-bound number (0, 6, 11, ...) using a simple regular expression - that could also be done with substr/instr.

The recursive member than repeats each of those anchor rows, adding one to the age each time, until it reaches the upper-bound number of the range.

You could use connect by as well, but it's a bit more awkward with multiple source rows.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
0

Answers on questions:

  1. Yes, using join approach with generated table "t1" is good idea.

  2. To generate table "t1" you can use next query:

    SELECT age as "Age", 
           CASE l_age WHEN 0 THEN 0 ELSE l_age + 1 END || '-' || r_age AS "Agegroup"
      FROM (
      SELECT lvl age,
             CASE m5 WHEN 0 THEN (t5-1)*5 ELSE t5 *5 END l_age,
             CASE m5 WHEN 0 THEN t5 *5 ELSE (t5+1)*5 END r_age
       FROM (
       SELECT /*+ cardinality(100) */
              level lvl, mod(level, 5) m5, TRUNC(level/5) t5
         FROM dual
      CONNECT BY level <= 100
     )
    );
    

Output:

            Age Agegroup
            1   0-5
            2   0-5
            3   0-5
            4   0-5
            5   0-5
            6   6-10
            7   6-10
            8   6-10
            9   6-10
            10  6-10
            11  11-15
            12  11-15
            13  11-15
            14  11-15
            15  11-15
            16  16-20
            17  16-20
            18  16-20
            19  16-20
            20  16-20
            21  21-25
            22  21-25
            23  21-25
            24  21-25
            25  21-25
            26  26-30
            27  26-30
            28  26-30
            29  26-30
            30  26-30
            .........

            80  76-80
            81  81-85
            82  81-85
            83  81-85
            84  81-85
            85  81-85
            86  86-90
            87  86-90
            88  86-90
            89  86-90
            90  86-90
            91  91-95
            92  91-95
            93  91-95
            94  91-95
            95  91-95
            96  96-100
            97  96-100
            98  96-100
            99  96-100
            100 96-100