To retrieve the supplementary information of primary keys, I came up with two ways of doing this.
One is to use GROUP BY
with aggregation function
[1] and the other is to use PARTITION BY
and a window/analytic function
(~ row_number) [2].
I am suspecting GROUP BY
to be the lower performance way due to the aggregation function (might not be true since the PARTITION BY
way uses subquery), however, I am not sure which would be the better way of appending supplementary columns.
- Which would be the better way in terms of time complexity and performance?
- Also, is there a better way of appending the auxiliary columns?
Example case: to retrieve the users' last login info (such as ip, country_code, ...)
Possible Solution 1: GROUP BY
with an aggregation function
#standardSQL
SELECT
user_id, -- primary key
ARRAY_AGG(
STRUCT(
ip, country_code, os, channel_type
)
ORDER BY login_timestamp DESC LIMIT 1
)[OFFSET(0)].*
FROM user_login_info_table
GROUP BY user_id
Possible Solution 2: PARTITION BY
with a subquery
#standardSQL
WITH
login_log AS (
SELECT
user_id,
ip, country_code, os, channel_type,
ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY login_timestamp DESC
) AS rn
FROM user_login_info_table
)
SELECT user_id, ip, country_code, os, channel_type
FROM login_log
WHERE rn = 1
c.f. The above example is an oversimplified example, which ignored other aggregation fields.
References