What is the alternative for side inputs in apache beam

Question

I am trying to join multiple kafka streams & lookups using Apache Beam. Im using side inputs for handling lookup tables and everything worked out in direct runner. But, when i try to run it in spark mode or flink mode, i learnt that side inputs are not supported. These are few links of the Jira Bugs.

http://mail-archives.apache.org/mod_mbox/beam-user/201605.mbox/%3C573EFC2F.6000708@nanthrax.net%3E https://issues.apache.org/jira/browse/FLINK-6131

https://issues.apache.org/jira/browse/BEAM-2112

Is there a way to use side input or a work around for this ?

Can i use stateful processing for this? I know the state is available per window per key,but still is there a tweak to use it in a different way ?

Can i use a caching db like memcached and fetch data while processing every record?

Any suggestions are highly appreciated.

Thanks,

score 0 · Answer 1 · answered Aug 23 '19 at 18:13

There is a alternative idea called seekable join in BeamSQL. It's similar to sideinput join but it just require one side of join input is seekable 1.

So it will depends on if you can construct a table to implement seek API (underlying implementation could just be a API call).

What is the alternative for side inputs in apache beam

1 Answers1