0

I am using Oracle SQL Developer and I need to pull counts from a very large database. I need to count both the number of visits and the number of unique visitors at each of 5 sites (and in total) for each of 10 quarters (and in total) - resulting in 120 counts. Each row in this database represents one visit, and the visitors each have a unique visitor_ID.

Currently, I have one line for each count, but I need to make modifications and I don't want to do it in such an inefficient way this time.

select
sum(case when visit_date between '01-JAN-2019 00:00:00' and '31-MAR-2019 00:00:00' and site=site1 then 1 else 0 end) as 19Q1_visits_site1,
count(distinct case when visit_date between '01-JAN-2019 00:00:00' and '31-MAR-2019 00:00:00' and site=site1 then visitor_id) as 19Q1_unique_site1,
[...]
from visitdata
where [additional qualifiers];

If possible, I would like to create something along the lines of:

19Q1 = visit_date between '01-JAN-2019 00:00:00' and '31-MAR-2019 00:00:00'
19Q2 = visit_date between '01-APR-2019 00:00:00' and '30-JUN-2019 00:00:00'
[...]
allQ = visit_date between '01-JAN-2019 00:00:00' and '30-SEP-2021 00:00:00'

S1 = site in (site1)
[...]
allS = site in (site1, site2, site3, site4, site5)

sites = [S1, S2, S3, S4, S5, allS]
quarters = [19Q1, 19Q2, ..., allQ]

for s in sites:
    for q in quarters:
        select
        sum(case when q and s then 1 else 0 end) as (str(q) + 'visits' + str(s)),
        count(distinct case when q and s then visitor_id) as (str(q) + 'unique' + str(s))
        from visitdata
        where [additional qualifiers];

I know SQL doesn't do for loops. Any advice would be fantastic so I don't have to create another embarrassing script with almost 200 lines. Thanks!

moogie
  • 37
  • 7

2 Answers2

0

Very briefly, use a built-in or user-defined function that takes a date and returns the quarter to which that date belongs. (SQL Server supports "quarter" as a datepart, but you can probably write this yourself if Oracle doesn't support it. You could also add visit_quarter to your table as a materialized computed column and even index it if you use this a lot.) Then you can write a single grouping query along the lines of

SELECT
  site,
  quarter(visit_date) as Q,
  COUNT(visitor_id) as numvisits,
  COUNT(DISTINCT visitor_id) AS numDistinctVisitors
FROM T
WHERE <additional conditions>
GROUP BY site, quarter(visit_date)
ORDER BY site, Q
Steve Kass
  • 7,144
  • 20
  • 26
  • Thank you! Looks like Oracle doesn't use support quarter, and I'm using a modified fiscal year anyway. Do you know how I can define quarters when the visit_date is stored as yyyymmdd? I am having trouble creating a materialized computed column. – moogie Oct 29 '21 at 18:55
  • When you say visit_date is stored as yyyymmdd, you are saying it is stored as a string, not a date? (If it is stored as a date, it is not stored "in a format.") Obviously, if it is intended to hold a date, it should be stored as a date, but if it's a string, I guess you could get the quarter based on the month with something like `1 + cast(substring(visit_date,5,2) as int) / 3`, which will give an integer division result if Oracle follows the SQL standard. – Steve Kass Oct 30 '21 at 20:13
  • Also, Oracle does seem to support quarter as a datetime unit: https://docs.oracle.com/cd/E37502_01/server.751/es_eql/src/ceql_functions_date_extract.html. – Steve Kass Oct 30 '21 at 20:15
  • Right, yeah, it's stored as a string. – moogie Nov 02 '21 at 17:49
0

For finding quarters you can use TO_CHAR() on dates, and for changing the start date and end date of a quarter, you can use ADD_MONTHS() eg

Table

create table randomdates( date_ )
as
select sysdate - dbms_random.value( 1, 600 )
from dual 
connect by level <= 50 ;

Query

select date_
, to_char( date_, 'YYYY-Q' ) quarter
from randomdates 
;

-- result
DATE_       QUARTER
09/23/2021  2021-3
09/24/2020  2020-3
03/23/2020  2020-1
03/29/2021  2021-1
11/29/2020  2020-4
03/05/2021  2021-1
04/08/2021  2021-2
...

GROUP BY should also be possible - as @Steve Kass suggested.

select to_char( date_, 'YYYY-Q' ), count(*)
from randomdates 
group by to_char( date_, 'YYYY-Q' )
order by 1 desc
;


TO_CHAR(DATE_,'YYYY-Q') COUNT(*)
2021-4  6
2021-3  13
2021-2  22
2021-1  13
2020-4  20
2020-3  12
2020-2  10
2020-1  4
...

From your comment:

I'm using a modified fiscal year anyway. Do you know how I can define quarters when the visit_date is stored as yyyymmdd?

If you need different start/end days for the, ADD_MONTHS() could help eg the modified_quarters in the following query start a month later than the "standard" quarters. Regarding the dates: in Oracle, they contain the century, the year within the century, the month, the day of the month, the hour, the minute and the second (7 bytes). You can just use TO_CHAR() and pick up whichever component (of the date) you need using the "Format Model" eg 'Q' in the example below.

-- Query executed in APEX.  
-- Column date_ : no formatting (compare the output to the same query in the dbfiddle).
select date_ 
, to_char( date_, 'YYYY-Q' ) quarter
, to_char( date_, 'YY' ) || 'Q' || to_char( date_, 'Q' ) quarter_
, to_char( add_months( date_, 1 ), 'YYYY-Q' ) modified_quarter
from randomdates 
;

DATE_       QUARTER QUARTER_ MODIFIED_QUARTER
09/23/2021  2021-3  21Q3     2021-4
09/24/2020  2020-3  20Q3     2020-4
03/23/2020  2020-1  20Q1     2020-2
03/29/2021  2021-1  21Q1     2021-2
11/29/2020  2020-4  20Q4     2020-4
03/05/2021  2021-1  21Q1     2021-2

For calculating subtotals and total (counts) per site, you could use GROUP BY ROLLUP() eg

Table & data

-- Caution: dates in this table are not the same as in the randomdates table.
create table sitesanddates( site_, date_ )
as
select 
  trunc( dbms_random.value( 1, 6 ) )
, sysdate - dbms_random.value( 1, 600 )
from dual 
connect by level <= 50 ;

-- group by site and quarter
select site_, to_char( date_, 'YYYY-Q' ), count(*)
from sitesanddates 
group by site_, to_char( date_, 'YYYY-Q' )
order by 1, 2
;

SITE_   TO_CHAR(DATE_,'YYYY-Q') COUNT(*)
1       2020-1                  1
1       2020-4                  3
1       2021-1                  1
1       2021-2                  1
1       2021-3                  2
2       2020-1                  1
2       2020-2                  1

GROUP BY ROLLUP

select site_, to_char( date_, 'YYYY-Q' ) q_, count(*)
from sitesanddates 
group by rollup( site_, to_char( date_, 'YYYY-Q' ) )
order by 1, 2
;


SITE_   Q_  COUNT(*)
1   2020-1  1
1   2020-4  3
1   2021-1  1
1   2021-2  1
1   2021-3  2
1   -       8  -- <- subtotal for site 1
2   2020-1  1
2   2020-2  1
...
5   2020-4  2
5   2021-1  2
5   2021-2  1
5   2021-4  1 
5   -      10  -- <- subtotal for site 5
-   -      50  -- <- grand total

link to dbfiddle

stefan
  • 2,182
  • 2
  • 13
  • 14
  • Nice answer. I would note, though, that it’s incorrect to say that dates are “stored as...” any string or human-readable format such as YYYYMMDD or otherwise. You can convert dates to strings in various formats with TO_CHAR, but dates are not stored in formatted sequences of digit characters (with or without hyphens, colons, etc.). See, for example, https://stackoverflow.com/questions/13568193/how-are-dates-stored-in-oracle/13568348#13568348 There may be a default string format that is used for human-readable presentation via one or another IDE, but that might well depend on the user’s locale. – Steve Kass Oct 31 '21 at 00:13
  • @SteveKass From the documentation: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Data-Types.html DATE Valid date range from January 1, 4712 BC, to December 31, 9999 AD. The default format is determined explicitly by the NLS_DATE_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is fixed at 7 bytes. This data type contains the datetime fields YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. It does not have fractional seconds or a time zone. – stefan Oct 31 '21 at 08:04
  • 1
    @SteveKass Thanks for your feedback! I have re-worded the part of the answer you are referring to. – stefan Oct 31 '21 at 08:16