0

I am attempting to get counts of items in multiple columns in a union all-ed query.

I have a MySQL database of trips taken, with location data for origin and destination for each trip. The columns are as such:

o_continent o_country o_state o_city d_continent d_country d_state d_city tripdate
North America United States California San Diego North America United States Arizona Phoenix 2020-10-01
North America United States California Los Angeles North America United States California Santa Rosa 2020-04-03
North America United States California Los Angeles North America United States Washington Spokane 2020-03-01
North America United States Oregon Portland North America Canada Alberta Calgary 2020-01-22
North America Canada Alberta Edmonton North America United States California Los Angeles 2020-04-01

I am merging the origins and the destinations using UNION ALL, as such:

SELECT o__continent as continent, o__country as country, o__state_name as state, o__city_name as city, o_ AS code
FROM db.trips
WHERE year(tripdate) = 2020
UNION ALL
SELECT d__continent as continent, d__country as country, d__state_name as state, d__city_name as city, d_ AS code
FROM db.trips
WHERE year(tripdate) = 2020

This produces a table that looks like this:

continent country state city
North America Canada Alberta Calgary
North America Canada Alberta Edmonton
North America United States Arizona Phoenix
North America United States California Los Angeles
North America United States California Los Angeles
North America United States California Los Angeles
North America United States California San Diego
North America United States California Santa Rosa
North America United States Oregon Portland
North America United States Washington Spokane

I am seeking out how I might be able to group these together (obviously traditionally you would use GROUP BY continent, country, state, city), but also to count the instances of each column in separate columns. The ideal final result would look something like this (note: the Los Angeles entries are grouped):

continent country state city continent_count country_count state_count city_count
North America Canada Alberta Calgary 10 2 2 1
North America Canada Alberta Edmonton 10 2 2 1
North America United States Arizona Phoenix 10 8 1 1
North America United States California Los Angeles 10 8 5 3
North America United States California San Diego 10 8 5 1
North America United States California Santa Rosa 10 8 5 1
North America United States Oregon Portland 10 8 1 1
North America United States Washington Spokane 10 8 1 1

I have been sifting for several days through other potential answers to this question on StackOverflow with little luck. Ideally I would not query the db.trips database over and over, as it is large. Is there a way to do this without losing too much efficiency?

Any help offered is most kindly and greatly appreciated.

GoOutside
  • 103
  • 1
  • 2
  • 11
  • Use `COUNT() OVER (PARTITION BY column)`. PS *`WHERE year(tripdate) = 2020`* - there is no `tripdate` column in your data... – Akina Dec 03 '21 at 08:09
  • Regrettably I am seeking a solution for MySQL 5.7, so I can not use the `OVER` function. I also omitted the `tripdate` from my example; it is now posted. – GoOutside Dec 03 '21 at 17:24
  • TLDRjust the comments and it looked like an analytic would work: so..."Any help offered is most kindly and greatly appreciated" https://stackoverflow.com/questions/1895110/row-number-in-mysql shows you how to simulate it using variables and a cross join or better a set statement I believe . Several options there... – xQbert Dec 03 '21 at 17:34
  • Or maybe: https://dba.stackexchange.com/questions/94545/calculate-row-value-based-on-previous-and-actual-row-values as the above is for row_number... – xQbert Dec 03 '21 at 17:39
  • What does it mean that the city_count is lower than the continent_count? – EdmCoff Dec 03 '21 at 18:04
  • You say db.trips is large - how big is it? 1,000s, 100,000s or 10,000,000s rows? Saying a table is large means nothing when people have different interpretations. Avoid unnecessary coercion like `WHERE year(tripdate) = 2020` when it can be just as easily written as `WHERE tripdate BETWEEN '2020-01-01` AND '2020-12-31' which can use indices and does not require a function call for every row. – user1191247 Dec 03 '21 at 20:11

1 Answers1

0

Given that you said you are stuck on MySQL 5.7, your best bet is going to be a normal GROUP BY and then post processing the counts in your programming language of choice -

SELECT *, COUNT(*) AS l_count
FROM (
(
    SELECT o_continent as continent, o_country as country, o_state as state, o_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
 ) UNION ALL (
    SELECT d_continent as continent, d_country as country, d_state as state, d_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
)
) `locations`
GROUP BY continent, country, state, city
ORDER BY continent ASC, country ASC, state ASC, city ASC;

But if you want to build the final result set in SQL there are a few different approaches and this performed best with my test dataset -

SELECT `locations`.*,
    `continents`.`count` AS continent_count,
    `countries`.`count` AS country_count,
    `states`.`count` AS state_count,
    COUNT(*) AS city_count
FROM (
(
    SELECT o_continent as continent, o_country as country, o_state as state, o_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
 ) UNION ALL (
    SELECT d_continent as continent, d_country as country, d_state as state, d_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
)
) `locations`
INNER JOIN (
    -- DERIVED TABLE FOR COUNT OF CONTINENTS
    SELECT continent, SUM(cnt) AS `count` FROM (
        SELECT o_continent AS continent, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY continent
        UNION ALL
        SELECT d_continent AS continent, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY continent
    ) tmp
    GROUP BY continent
) `continents` ON `locations`.`continent` = `continents`.`continent`

INNER JOIN (
    -- DERIVED TABLE FOR COUNT OF COUNTRIES
    SELECT country, SUM(cnt) AS `count` FROM (
        SELECT o_country AS country, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY country
        UNION ALL
        SELECT d_country AS country, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY country
    ) tmp
    GROUP BY country
) `countries` ON `locations`.`country` = `countries`.`country`

INNER JOIN (
    -- DERIVED TABLE FOR COUNT OF STATES
    SELECT state, SUM(cnt) AS `count` FROM (
        SELECT o_state AS state, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY state
        UNION ALL
        SELECT d_state AS state, COUNT(*) AS cnt FROM trips WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31' GROUP BY state
    ) tmp
    GROUP BY state
) `states` ON `locations`.`state` = `states`.`state`

GROUP BY `locations`.`continent`, `locations`.`country`, `locations`.`state`, `locations`.`city`
ORDER BY `locations`.`continent` ASC, `locations`.`country` ASC, `locations`.`state` ASC, `locations`.`city` ASC;

If you are running MySQL > 8.0 then using window functions is definitely the easiest and most performant solution -

SELECT *,
    SUM(COUNT(*)) OVER (PARTITION BY continent) AS continent_count,
    SUM(COUNT(*)) OVER (PARTITION BY country) AS country_count,
    SUM(COUNT(*)) OVER (PARTITION BY state) AS state_count,
    COUNT(*) AS city_count
FROM (
(
    SELECT o_continent as continent, o_country as country, o_state as state, o_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
 ) UNION ALL (
    SELECT d_continent as continent, d_country as country, d_state as state, d_city as city
    FROM trips
    WHERE tripdate BETWEEN '2020-01-01' AND '2020-12-31'
)
) `locations`
GROUP BY continent, country, state, city
ORDER BY continent ASC, country ASC, state ASC, city ASC;
user1191247
  • 10,808
  • 2
  • 22
  • 32