Counting multiple columns & groups from one UNION ALL query

Question

I am attempting to get counts of items in multiple columns in a union all-ed query.

I have a MySQL database of trips taken, with location data for origin and destination for each trip. The columns are as such:

o_continent	o_country	o_state	o_city	d_continent	d_country	d_state	d_city	tripdate
North America	United States	California	San Diego	North America	United States	Arizona	Phoenix	2020-10-01
North America	United States	California	Los Angeles	North America	United States	California	Santa Rosa	2020-04-03
North America	United States	California	Los Angeles	North America	United States	Washington	Spokane	2020-03-01
North America	United States	Oregon	Portland	North America	Canada	Alberta	Calgary	2020-01-22
North America	Canada	Alberta	Edmonton	North America	United States	California	Los Angeles	2020-04-01

I am merging the origins and the destinations using UNION ALL, as such:

SELECT o__continent as continent, o__country as country, o__state_name as state, o__city_name as city, o_ AS code
FROM db.trips
WHERE year(tripdate) = 2020
UNION ALL
SELECT d__continent as continent, d__country as country, d__state_name as state, d__city_name as city, d_ AS code
FROM db.trips
WHERE year(tripdate) = 2020

This produces a table that looks like this:

continent	country	state	city
North America	Canada	Alberta	Calgary
North America	Canada	Alberta	Edmonton
North America	United States	Arizona	Phoenix
North America	United States	California	Los Angeles
North America	United States	California	Los Angeles
North America	United States	California	Los Angeles
North America	United States	California	San Diego
North America	United States	California	Santa Rosa
North America	United States	Oregon	Portland
North America	United States	Washington	Spokane

I am seeking out how I might be able to group these together (obviously traditionally you would use GROUP BY continent, country, state, city), but also to count the instances of each column in separate columns. The ideal final result would look something like this (note: the Los Angeles entries are grouped):

continent	country	state	city	continent_count	country_count	state_count	city_count
North America	Canada	Alberta	Calgary	10	2	2	1
North America	Canada	Alberta	Edmonton	10	2	2	1
North America	United States	Arizona	Phoenix	10	8	1	1
North America	United States	California	Los Angeles	10	8	5	3
North America	United States	California	San Diego	10	8	5	1
North America	United States	California	Santa Rosa	10	8	5	1
North America	United States	Oregon	Portland	10	8	1	1
North America	United States	Washington	Spokane	10	8	1	1

I have been sifting for several days through other potential answers to this question on StackOverflow with little luck. Ideally I would not query the db.trips database over and over, as it is large. Is there a way to do this without losing too much efficiency?

Any help offered is most kindly and greatly appreciated.

Use `COUNT() OVER (PARTITION BY column)`. PS *`WHERE year(tripdate) = 2020`* - there is no `tripdate` column in your data... — Akina, Dec 03 '21 at 08:09
Regrettably I am seeking a solution for MySQL 5.7, so I can not use the `OVER` function. I also omitted the `tripdate` from my example; it is now posted. — GoOutside, Dec 03 '21 at 17:24
TLDRjust the comments and it looked like an analytic would work: so..."Any help offered is most kindly and greatly appreciated" https://stackoverflow.com/questions/1895110/row-number-in-mysql shows you how to simulate it using variables and a cross join or better a set statement I believe . Several options there... — xQbert, Dec 03 '21 at 17:34
Or maybe: https://dba.stackexchange.com/questions/94545/calculate-row-value-based-on-previous-and-actual-row-values as the above is for row_number... — xQbert, Dec 03 '21 at 17:39
What does it mean that the city_count is lower than the continent_count? — EdmCoff, Dec 03 '21 at 18:04
You say db.trips is large - how big is it? 1,000s, 100,000s or 10,000,000s rows? Saying a table is large means nothing when people have different interpretations. Avoid unnecessary coercion like `WHERE year(tripdate) = 2020` when it can be just as easily written as `WHERE tripdate BETWEEN '2020-01-01` AND '2020-12-31' which can use indices and does not require a function call for every row. — user1191247, Dec 03 '21 at 20:11

score 0 · Answer 1 · answered Dec 04 '21 at 15:23

Given that you said you are stuck on MySQL 5.7, your best bet is going to be a normal GROUP BY and then post processing the counts in your programming language of choice -

SELECT *, COUNT(*) AS l_count
FROM (
(
    SELECT o_continent as continent, o_country as country, o_state as state, o_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
 ) UNION ALL (
    SELECT d_continent as continent, d_country as country, d_state as state, d_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
)
) `locations`
GROUP BY continent, country, state, city
ORDER BY continent ASC, country ASC, state ASC, city ASC;

But if you want to build the final result set in SQL there are a few different approaches and this performed best with my test dataset -

SELECT `locations`.*,
    `continents`.`count` AS continent_count,
    `countries`.`count` AS country_count,
    `states`.`count` AS state_count,
    COUNT(*) AS city_count
FROM (
(
    SELECT o_continent as continent, o_country as country, o_state as state, o_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
 ) UNION ALL (
    SELECT d_continent as continent, d_country as country, d_state as state, d_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
)
) `locations`
INNER JOIN (
    -- DERIVED TABLE FOR COUNT OF CONTINENTS
    SELECT continent, SUM(cnt) AS `count` FROM (
        SELECT o_continent AS continent, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY continent
        UNION ALL
        SELECT d_continent AS continent, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY continent
    ) tmp
    GROUP BY continent
) `continents` ON `locations`.`continent` = `continents`.`continent`

INNER JOIN (
    -- DERIVED TABLE FOR COUNT OF COUNTRIES
    SELECT country, SUM(cnt) AS `count` FROM (
        SELECT o_country AS country, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY country
        UNION ALL
        SELECT d_country AS country, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY country
    ) tmp
    GROUP BY country
) `countries` ON `locations`.`country` = `countries`.`country`

INNER JOIN (
    -- DERIVED TABLE FOR COUNT OF STATES
    SELECT state, SUM(cnt) AS `count` FROM (
        SELECT o_state AS state, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY state
        UNION ALL
        SELECT d_state AS state, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY state
    ) tmp
    GROUP BY state
) `states` ON `locations`.`state` = `states`.`state`

GROUP BY `locations`.`continent`, `locations`.`country`, `locations`.`state`, `locations`.`city`
ORDER BY `locations`.`continent` ASC, `locations`.`country` ASC, `locations`.`state` ASC, `locations`.`city` ASC;

If you are running MySQL > 8.0 then using window functions is definitely the easiest and most performant solution -

SELECT *,
    SUM(COUNT(*)) OVER (PARTITION BY continent) AS continent_count,
    SUM(COUNT(*)) OVER (PARTITION BY country) AS country_count,
    SUM(COUNT(*)) OVER (PARTITION BY state) AS state_count,
    COUNT(*) AS city_count
FROM (
(
    SELECT o_continent as continent, o_country as country, o_state as state, o_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
 ) UNION ALL (
    SELECT d_continent as continent, d_country as country, d_state as state, d_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
)
) `locations`
GROUP BY continent, country, state, city
ORDER BY continent ASC, country ASC, state ASC, city ASC;

Counting multiple columns & groups from one UNION ALL query

1 Answers1