I have data coming from two different Kafka topics, served by different brokers, with each topic having different numbers of partitions. One stream has events about ads being served, the other has clicks:
ad_serves: ad_id, ip, sTime
ad_clicks: ad_id, ip, cTime
The documentation for process functions includes a section on implementing low-level joins with a CoProcessFunction
or KeyedCoProcessFunction
, but I'm not sure how to set that up.
I'm also wondering if one of Flink's SQL Joins could be used here. I'm interested both in simple joins like
SELECT s.ad_id, s.sTime, c.cTime
FROM ad_serves s, ad_clicks c
WHERE s.ad_id = c.ad_id
as well as analytical queries based on ads clicked on within 5 seconds of being served:
SELECT s.ad_id
FROM ad_serves s, ad_clicks c
WHERE
s.ad_id = c.ad_id AND
s.ip = c.ip AND
c.cTime BETWEEN s.sTime AND
s.sTime + INTERVAL ‘5’ SECOND;