I have a table which contains 1 billion rows and would like to add top 1 row from each group to all rows.
But I'm facing Query exhausted resources at this scale factor.
error when I run the query.
MY_TABLE
+--------+---------+-------+
| COL_A | COL_B | COL_C |
+--------+---------+-------+
| item_a | group_a | 1 |
| item_a | group_b | 2 |
| item_b | group_a | 1 |
+--------+---------+-------+
What I want to get
+--------+---------+-------+----------------+
| COL_A | COL_B | COL_C | FIRST_ITEM_SEQ |
+--------+---------+-------+----------------+
| item_a | group_a | 1 | 1 |
| item_a | group_b | 2 | 1 |
| item_b | group_a | 1 | 1 |
+--------+---------+-------+----------------+
This is what I ran on Athena
SELECT ITEMS.*,
FIRST_ITEMS.COL_C AS FIRST_ITEM_DATE
FROM MY_TABLE ITEMS
LEFT JOIN (SELECT COL_A, COL_B, MIN(COL_C)
FROM MY_TABLE
GROUP BY COL_A, COL_B) FIRST_ITEMS
ON ITEMS.COL_A = FIRST_ITEMS.COL_A
AND ITEMS.COL_B = FIRST_ITEMS.COL_B
;
Could someone let me know better approach for this?