0

I made the following query to select persons by age group, as count and as percentage. Ages are stored as 0000-00-00 in my database.

SELECT  AgeGroup, count(*) AS count, ROUND(sum( 100 ) / total) AS percentage
        FROM    (
        SELECT  case
                when  age between 0 and 17 then '00 - 17'
                when  age between 18 and 24 then '18 − 24'
                when  age between 25 and 34 then '25 − 34'
                when  age between 35 and 44 then '35 − 44'
                when  age between 45 and 54 then '45 − 54'
                when  age between 55 and 64 then '55 − 64'
                when  age between 65 and 125 then '65+'
                else 'Unknown'
                end AS AgeGroup
        FROM    (
                SELECT  ROUND(DATEDIFF(Cast(NOW() as Date),
                            Cast(dateofbirth as Date)) / 365, 0) as age
                FROM    people
                ) as SubQueryAlias
        ) as SubQueryAlias2
        CROSS JOIN (SELECT count( * ) AS total FROM people)x
        group by
        AgeGroup

The current result is:

AgeGroup | count | percentage
00 - 17    33      1
18 − 24    235     5
..         ..      ..

What I need is a addition to the query to separate the results in male/female/unknown:

AgeGroup  | gender | count | percentage
00 - 17     M        33      1
00 - 17     F        33      1
..          ..       ..      ..
Citizen SP
  • 1,411
  • 7
  • 36
  • 68

2 Answers2

1

You might have the easiest time by defining range-tables. This also prevents you from needing to do date math on every entry, and so may be more efficient for the grouping.

First, a range table for ages:

SELECT '00 - 17' AS ageGroup, CURRENT_DATE AS lower, CURRENT_DATE - INTERVAL 18 YEAR AS upper
UNION ALL
SELECT '18 - 24', CURRENT_DATE - INTERVAL 18 YEAR, CURRENT_DATE - INTERVAL 25 YEAR
UNION ALL
SELECT '25 - 34', CURRENT_DATE - INTERVAL 25 YEAR, CURRENT_DATE - INTERVAL 35 YEAR
UNION ALL
SELECT '35 - 44', CURRENT_DATE - INTERVAL 35 YEAR, CURRENT_DATE - INTERVAL 45 YEAR
UNION ALL
SELECT '45 - 54', CURRENT_DATE - INTERVAL 45 YEAR, CURRENT_DATE - INTERVAL 55 YEAR
UNION ALL
SELECT '55 - 64', CURRENT_DATE - INTERVAL 55 YEAR, CURRENT_DATE - INTERVAL 65 YEAR
UNION ALL
SELECT '65+', CURRENT_DATE - INTERVAL 65 YEAR, null
UNION ALL
SELECT 'Unknown', null, null

SQL FIddle Demo

...which generates a table about like you'd expect. Note that the upper-bound is exclusive, which is why it uses the same value as the lower bound of the next row. Note also that 1) the '65+' bracket has no upper bound, and 2) the 'Unknown' bracket has neither.

Of course, we also need a Gender table:

SELECT 'M' AS gender
UNION ALL
SELECT 'F'
UNION ALL
SELECT 'Unknown'

(As a side note, I'd normally be using a multi-line VALUES(...) statements, but SQL Fiddle seems to dislike the syntax in subqueries for MySQL for some reason. Use whichever you're comfortable with.)

There's one last piece of knowledge we need:
Specifically, COUNT(<expression>) will ignore null rows. We can thus stitch together the full query similarly to:

SELECT AgeRange.ageGroup, Gender.gender, 
       COUNT(People.id), ROUND(100 * COUNT(People.id) / Total.countOfPeople) AS percentage
FROM (SELECT '00 - 17' AS ageGroup, CURRENT_DATE AS lower, CURRENT_DATE - INTERVAL 18 YEAR AS upper
      UNION ALL
      SELECT '18 - 24', CURRENT_DATE - INTERVAL 18 YEAR, CURRENT_DATE - INTERVAL 25 YEAR
      UNION ALL
      SELECT '25 - 34', CURRENT_DATE - INTERVAL 25 YEAR, CURRENT_DATE - INTERVAL 35 YEAR
      UNION ALL
      SELECT '35 - 44', CURRENT_DATE - INTERVAL 35 YEAR, CURRENT_DATE - INTERVAL 45 YEAR
      UNION ALL
      SELECT '45 - 54', CURRENT_DATE - INTERVAL 45 YEAR, CURRENT_DATE - INTERVAL 55 YEAR
      UNION ALL
      SELECT '55 - 64', CURRENT_DATE - INTERVAL 55 YEAR, CURRENT_DATE - INTERVAL 65 YEAR
      UNION ALL
      SELECT '65+', CURRENT_DATE - INTERVAL 65 YEAR, null
      UNION ALL
      SELECT 'Unknown', null, null)  AgeRange
CROSS JOIN (SELECT 'M' AS Gender
            UNION ALL
            SELECT 'F'
            UNION ALL 
            SELECT 'Unknown') Gender
CROSS JOIN (SELECT COUNT(*) countOfPeople
            FROM People) Total
LEFT JOIN People
       ON ((People.dateOfBirth > AgeRange.upper AND dateOfBirth <= AgeRange.lower)
           OR (People.dateOfBirth <= AgeRange.lower AND AgeRange.upper IS NULL)
           OR (AgeRange.lower IS NULL AND AgeRange.upper IS NULL AND People.dateOfBirth IS NULL))
          AND (Gender.gender = People.gender
               OR Gender.gender = 'Unknown' AND People.gender IS NULL)
GROUP BY AgeRange.ageGroup, Gender.gender

SQL Fiddle Demo
(note the Fiddle demo uses the date of this post, '2014-07-21', as CURRENT_DATE, to make the age range query stable for future readers).

Clockwork-Muse
  • 12,806
  • 6
  • 31
  • 45
0

I really hope I am wrong about it ....but would the reason of constant error be...you didn't select the gender?

Also, a nerdy side note, 365 days doesn't make a year, it's roughly 365.25 days XD which mean your equation is slightly off haha

SELECT  AgeGroup, gender, count(*) AS count, ROUND(sum( 100 ) / total) AS percentage
    FROM    (
    SELECT  case
            when  age between 0 and 17 then '00 - 17'
            when  age between 18 and 24 then '18 − 24'
            when  age between 25 and 34 then '25 − 34'
            when  age between 35 and 44 then '35 − 44'
            when  age between 45 and 54 then '45 − 54'
            when  age between 55 and 64 then '55 − 64'
            when  age between 65 and 125 then '65+'
            else 'Unknown'
            end AS AgeGroup, gender
    FROM    (
            SELECT  ROUND(DATEDIFF(Cast(NOW() as Date),
                        Cast(dateofbirth as Date)) / 365, 0) as age, 
                        gender
            FROM    people
            ) as SubQueryAlias
    ) as SubQueryAlias2
    CROSS JOIN (SELECT count( * ) AS total FROM people)x
    group by
    AgeGroup, gender
Jacky Cheng
  • 1,536
  • 1
  • 10
  • 22