1

I have a hive table with IDs and associated events that look like below. The table can have multiple event_number for the same ID -

ID    event_number    Date
ABC      1           2022-08-01
ABC      2           2022-08-01
ABC      3           2022-08-01
DEF      1           2022-08-01
GHI      2           2022-08-02
DEF      3           2022-08-01

I want to find unique ids that have events 1 and 2 in a day

  • Here the output would be ABC because that is the only ID with both event 1 and event 2 for a given date.
  • It cannot be DEF or GHI since they either have event 1 or 2

Here is the query I came up for this -

select distinct ID from table where event_number=1 and date=2022-08-01 
and ID in( Select ID from table where event_number=2 and date=2022-08-01);

Is there a more elegant or efficient way to do this?

SRm
  • 49
  • 9
  • I do not know all your test cases but can you try this? `select id from mytable group by id,date having count(*)>1` ? This will give you data if there is multiple same id and date having count>1. This will not work if you have multiple same event number, id, date. – Koushik Roy Aug 12 '22 at 05:01
  • Thanks for your response! This will not work since I can have multiples of same event id, date and id – SRm Aug 12 '22 at 07:11
  • can you pls add more examples to your question which covers all your test cases? – Koushik Roy Aug 12 '22 at 08:16
  • I have made edits above. -Mainly, I only want IDs that have event 1 and event 2 associated with it, -The ID may have other events as well but I am only looking at 1 and 2 - I need a list of unique IDs that have 1 and 2 events associated with it -I am currently using IN clause, I am wondering if there is a more efficient way to get what I need since I deal with huge amounts of data – SRm Aug 12 '22 at 19:06

2 Answers2

0

First filer the records which are matching your event filter , then aggregate on dates and get the rows where the event_count is more than 1 for the given date. e.g.

select id,`date`,count(distinct event_number) event_count from (
select id,event_number,`date` from table where id in (1,2)
) a group by id,`date` having event_count>1;
0

The SQL as below:

select 
    id
from (
    select
        id,
        date,
        count(disintct event_number) as event_no_cnt
    from
        table 
    where
        event_number in (1,2)
    group by
        id,
        date
    having(count(disintct event_number)=2)
) tmp 
group by id
Shawn.X
  • 1,323
  • 6
  • 15