I have a query as follows:
SELECT
e.event_id,
(
SELECT
event_id
FROM atomic.events
WHERE
domain_userid = e.domain_userid
ORDER BY collector_tstamp
LIMIT 1
) AS parent_event_id
FROM snowplow_intermediary.events_enriched e
LIMIT 1
I'm trying to find the first event for each user. This is fairly fast ~5s.
If I'm trying to find the user by user_ipaddress
instead of domain_userid
it's way slower. After 300s it's not done.
SELECT
e.event_id,
(
SELECT
event_id
FROM atomic.events
WHERE
user_ipaddress = e.user_ipaddress
ORDER BY collector_tstamp
LIMIT 1
) AS parent_event_id
FROM snowplow_intermediary.events_enriched e
LIMIT 1
The data type is domain_userid varchar(36) encode runlength
and user_ipaddress varchar(45) encode runlength
.
Here's EXPLAIN's for the queries:
https://gist.github.com/mortenstarfly/4ce3be9b3a19aac2601a
https://gist.github.com/mortenstarfly/2008b0f737259df30695
I really would like to speed up the second query. Any suggestions?