Postgres version 9.4.18, PostGIS Version 2.2.
Here are the tables I'm working with (and can unlikely make significant changes to the table structure):
Table ltg_data (spans 1988 to 2018):
Column | Type | Modifiers
----------+--------------------------+-----------
intensity | integer | not null
time | timestamp with time zone | not null
lon | numeric(9,6) | not null
lat | numeric(8,6) | not null
ltg_geom | geometry(Point,4269) |
Indexes:
"ltg_data2_ltg_geom_idx" gist (ltg_geom)
"ltg_data2_time_idx" btree ("time")
Size of ltg_data (~800M rows):
ltg=# select pg_relation_size('ltg_data');
pg_relation_size
------------------
149729288192
Table counties:
Column | Type | Modifiers
-----------+-----------------------------+--------------------------------- -----------------------
gid | integer | not null default
nextval('counties_gid_seq'::regclass)
objectid_1 | integer |
objectid | integer |
state | character varying(2) |
cwa | character varying(9) |
countyname | character varying(24) |
fips | character varying(5) |
time_zone | character varying(2) |
fe_area | character varying(2) |
lon | double precision |
lat | double precision |
the_geom | geometry(MultiPolygon,4269) |
Indexes:
"counties_pkey" PRIMARY KEY, btree (gid)
"counties_gix" gist (the_geom)
"county_cwa_idx" btree (cwa)
"countyname_cwa_idx" btree (countyname)
I have a query that calculates the total number of rows per day of the year (month-day) spanning the 30 years. With the help of Stackoverflow, the query to get these counts is working fine. Here's the query and results, using the following function.
Function:
CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE AS
$$SELECT to_char($1, 'MMDD')::int$$;
Query:
SELECT d.mmdd, COALESCE(ct.ct, 0) AS total_count
FROM (
SELECT f_mmdd(d::date) AS mmdd -- ignoring the year
FROM generate_series(timestamp '2018-01-01' -- any dummy year
, timestamp '2018-12-31'
, interval '1 day') d
) d
LEFT JOIN (
SELECT f_mmdd(time::date) AS mmdd, count(*) AS ct
FROM counties c
JOIN ltg_data d ON ST_contains(c.the_geom, d.ltg_geom)
WHERE cwa = 'MFR'
GROUP BY 1
) ct USING (mmdd)
ORDER BY 1;
Results:
mmdd total_count
725 | 2126
726 | 558
727 | 2
728 | 2
729 | 2
730 | 0
731 | 0
801 | 0
802 | 10
Desired Results: I'm trying to find other statistical information about the counts for the days of the year. For instance, I know on July 25 (725 in the table below) that the total count over the many years that are in the table is 2126. What I'm looking for is the max daily count for July 25 (725), percent of years that that day is not zero, the min, percent years where count(*) is not zero, percentiles (10th percentile, 25th percentile, 50th percentile, 75th percentile, 90th percentile, and stdev would be useful too). It would be good to see what year the max_daily occurred. I guess if there haven't been any counts for that day in all the years, the year_max_daily would be blank or zero.
mmdd total_count max daily year_max_daily percent_years_count_not_zero 10th percentile_daily 90th percentile_daily
725 | 2126 1000 1990 30 15 900
726 | 558 120 1992 20 10 80
727 | 2 1 1991 2 0 1
728 | 2 1 1990 2 0 1
729 | 2 1 1989 2 0 1
730 | 0 0 0 0 0
731 | 0 0 0 0 0
801 | 0 0 0 0 0
802 | 10 10 1990 0 1 8
What I've tried thus far just isn't working. It returns the same results as total. I think it's because I'm just trying to get an avg after the totals have already been calculated, so I'm not really looking at the counts for each day of each year and finding the average.
Attempt:
SELECT AVG(CAST(total_count as FLOAT)), day
FROM
(
SELECT d.mmdd as day, COALESCE(ct.ct, 0) as total_count
FROM (
SELECT f_mmdd(d::date) AS mmdd
FROM generate_series(timestamp '2018-01-01', timestamp '2018-12-31', interval '1 day') d
) d
LEFT JOIN (
SELECT mmdd, avg(q.ct) FROM (
SELECT f_mmdd((time at time zone 'utc+12')::date) as mmdd, count(*) as ct
FROM counties c
JOIN ltg_data d on ST_contains(c.the_geom, d.ltg_geom)
WHERE cwa = 'MFR'
GROUP BY 1
)
) as q
ct USING (mmdd)
ORDER BY 1
Thanks for any help!