1

I have a code in Qubole that's taking almost 3 hours to execute. I am looking for some recommendations to decrease the code execution time.

WITH
    -- Get latest date - 10 days before as day 
d
AS (
    SELECT CAST(CONCAT (
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 1, 4),
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 6, 2),
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 9, 2)
                ) AS BIGINT) AS day,
        CAST(CONCAT (
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 1, 4),
                '-',
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 6, 2),
                '-',
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 9, 2)
                ) AS DATE) AS DATE,
        'FR' AS country
    )
SELECT 'Streaming' AS TRANSACTION,
    'Spotify' AS account,
    p_day,
    access,
    COUNT(DISTINCT customer_id) AS users,
    COUNT(*) AS units
FROM temp_1
WHERE day >= (
        SELECT day
        FROM d
        )
    AND country_code = (
        SELECT country
        FROM d
        )
GROUP BY 1,
    2,
    3,
    4

UNION ALL

SELECT 'Streaming' AS TRANSACTION,
    'Deezer' AS account,
    p_day,
    CASE 
        WHEN offer_code IN ('APP', 'BAO', 'BDP', 'BDS', 'BMO', 'BMS', 'BMW', 'BPF', 'BPP', 'BPR', 'BSO', 'BWE', 'BWP', 'BWS', 'DEE', 'DEP', 'ETT', 'EXT', 'FFX', 'IOS', 'OT1', 'PBH', 'PE1', 'PE2', 'PEM', 'PLS', 'PRM', 'PSC', 'PTP', 'SDP', 'SMG', 'SPF', 'SPP', 'SPR', 'SUP', 'SWE', 'SWP', '3M', 'FAM', 'GOO', 'GOF', 'HFP', 'HFF', 'HFI')
            THEN 'premium'
        WHEN offer_code IN ('BFR', 'MFS', 'MOD', 'SMR')
            THEN 'free'
        ELSE NULL
        END AS access,
    COUNT(DISTINCT masked_consumer_id) AS users,
    SUM(units_sold_streams) AS streams
FROM temp_2
WHERE day >= (
        SELECT day
        FROM d
        )
    AND country_code = (
        SELECT country
        FROM d
        )
GROUP BY 1,
    2,
    3,
    4

UNION ALL

SELECT 'Streaming' AS TRANSACTION,
    'Apple Music' AS account,
    ingest_datestamp AS p_day,
    'premium' AS access,
    COUNT(DISTINCT anonymized_person_id) AS users,
    COUNT(*) AS streams
FROM temp_streams1
WHERE ingest_datestamp >= (
        SELECT DATE
        FROM d
        )
    AND country_code = (
        SELECT country
        FROM d
        )
GROUP BY 1,
    2,
    3,
    4
leftjoin
  • 36,950
  • 8
  • 57
  • 116
Flash
  • 11
  • 1
  • 1
    At a very minimum you should [include the actual Execution Plan](https://stackoverflow.com/a/7359705/1260204), you could use [Paste the Plan](https://www.brentozar.com/pastetheplan/) and share the link in your question. Also [try to read it yourself](https://stackoverflow.com/a/759097/1260204), maybe you can figure out the performance issue(s) with your query. Finally include the [schema DDL](https://en.wikipedia.org/wiki/Data_definition_language). – Igor Apr 11 '19 at 17:48
  • What database are you using and version? Is it Presto? It seems like presto syntax. Or is it mysql... – leftjoin May 14 '19 at 09:19
  • `GROUP BY 1, 2, 3, 4` is deprecated SQL standard syntax, i believe this is deprecated sinds SQL 1999... Also this is very error prone and it does not read easy. – Raymond Nijland May 14 '19 at 15:45

1 Answers1

0

This will not help much with query performance optimizing, but will help to simplify code a bit. Dates calculation can be simplified (tested on Presto)

cast(DATE_FORMAT(DATE_ADD('day', -10, CURRENT_DATE),'%Y%m%d') as bigint) as day,
DATE_ADD('day', -10, CURRENT_DATE)                                       as date

And to improve performance I'd recommend to partition your tables by dates and depending of the data size on country codes also and pass dates calculated as a parameters, not as subqueries to make sure partition pruning works.

leftjoin
  • 36,950
  • 8
  • 57
  • 116