Having an RDS Postgresql 12.5 and working on a table (app_events
) with a JSONB column (metadata
), the JSON data may vary based on the event name, see the structure below.
CREATE TABLE IF NOT EXISTS "public".app_events (
id uuid DEFAULT uuid_generate_v4() NOT NULL,
event_id text NOT NULL,
name text NOT NULL,
creation_time timestamp without time zone NOT NULL,
creation_time_in_milliseconds bigint NOT NULL,
metadata jsonb NOT NULL,
PRIMARY KEY(id)
);
There are default indexes (btree
) on event_id
and name
.
Based on the name
column we create views to normalize the data from JSON into tabular format, an example is below.
CREATE OR REPLACE VIEW "public".charity_created AS
SELECT app_events.id,
app_events.event_id,
app_events.name,
app_events.creation_time,
date(app_events.creation_time) AS created_date,
(app_events.metadata ->> 'aggregateId'::text) AS user_id,
(app_events.metadata ->> 'url'::text) AS url,
(app_events.metadata ->> 'name'::text) AS charity_name,
(app_events.metadata ->> 'about'::text) AS charity_about,
(app_events.metadata ->> 'country'::text) AS country,
(app_events.metadata ->> 'category'::text) AS category,
(app_events.metadata ->> 'currencyCode'::text) AS currencycode,
(app_events.metadata ->> 'isPayItForwardPartner'::text) AS is_pay_it_forward_partner,
(app_events.metadata ->> 'isCampaignDonationPartner'::text) AS is_campaign_donation_partner
FROM public.app_events
WHERE (app_events.name = 'CharityCreated'::text)
ORDER BY (date(app_events.creation_time)) DESC;
And now you can run queries like the below.
SELECT *
FROM "public".charity_created
WHERE charity_name == 'some_charity_name'
In addition, we create joins or unions between views and started noticing latency on reads/queries up to an hour sometimes, no timeouts but returning data can be challenging, definitely a big hit in our reporting team.
The question and knowledge I am looking for are, where can I create (or should create) indexes to improve the read latency; Debating with two findings so far.
- On the actual JSONB column (
metadata
) doesn't make sense, so maybe a solution is to start creating indexes based on the specific schema per event name like this answer - Create
materialized views
and pay the price to refresh the copy of data (every night) while creating indexes on the materialized views, similar to this answer - Maybe keep the
views
and create indexes on these