0

I have a table that contains whenever an email has been open. When a user opens an email multiple times you will have rows that shared the same values except the CREATED_DATE (OPENED_DATE of an email).

TABLE 1 - OPENED_EMAILS

CREATED_DATE EMAIL_CODE USER_ID
10-5-2022 E1 U1
15-5-2022 E2 U2
17-5-2022 E2 U3
17-5-2022 E3 U4
20-5-2022 E2 U3
22-5-2022 E3 U4
23-5-2022 E3 U4

I would like to make a query that retrieves only the first time an email has been opened by an user (MIN_CREATED_DATE of an opened email).

Expected result:

TABLE 2 - FIRST_OPENED_EMAILS

CREATED_DATE EMAIL_CODE USER_ID
10-5-2022 E1 U1
15-5-2022 E2 U2
17-5-2022 E2 U3
17-5-2022 E3 U4
Maxperezt
  • 67
  • 7
  • Does this answer your question? [Fetch the rows which have the Max value for a column for each distinct value of another column](https://stackoverflow.com/questions/121387/fetch-the-rows-which-have-the-max-value-for-a-column-for-each-distinct-value-of) – philipxy Mar 10 '23 at 19:07

2 Answers2

1

If you want the first time EMAIL_CODE E3 AND USER_ID U4 you will need to do something to link them, I would concatenate them then group by that concatenation

Select
CONCAT(EMAIL_CODE,USER_ID) as link
min(CREATED_DATE)
group by link
Max_Stone
  • 254
  • 5
1

You can try one of followings.

SELECT ARRAY_AGG(t ORDER BY PARSE_DATE('%d-%m-%Y', CREATED_DATE))[SAFE_OFFSET(0)].*
  FROM opened_email t
 GROUP BY t.EMAIL_CODE, t.USER_ID;
SELECT FORMAT_DATE(
         '%d-%-m-%Y',
         MIN(PARSE_DATE('%d-%m-%Y', CREATED_DATE))
       ) AS CREATED_DATE,
       EMAIL_CODE,
       USER_ID
  FROM opened_email t
 GROUP BY 2, 3;
  • format string %-m will display month without preceding 0.
SELECT ANY_VALUE(t HAVING MIN PARSE_DATE('%d-%m-%Y', CREATED_DATE)).*
  FROM opened_email t 
 GROUP BY t.EMAIL_CODE, t.USER_ID;
 SELECT * FROM opened_email 
QUALIFY MIN(CREATED_DATE) OVER (PARTITION BY EMAIL_CODE, USER_ID) = CREATED_DATE;

Query results

enter image description here

Jaytiger
  • 11,626
  • 2
  • 5
  • 15
  • Thank you! Out of curiosity what would be the most efficient one? – Maxperezt Mar 01 '23 at 11:43
  • 1
    It's highly depends on the data and kind of controversal topic. :), so, you should test it with your real dataset. Anyway 2nd standard approach will show good performance and first one using an array will show relatively poor performance in this case, I guess. just FYI. – Jaytiger Mar 01 '23 at 12:24