I have growing data in GCS and will have a batch job that runs lets say every day to process 1 million of articles increment. I need to get additional information for the keys from BigTable (containing billions of records). Is it feasible to do just a lookup with every item in map operation? Does it make sense to batch those lookups and perform something like bulk read? Or what is the best way for this use case using scio/beam?
I found in Pattern: Streaming mode large lookup tables that performing lookup on every request is recommended approach for streaming, however I'm not sure if I wouldn't overload BigTable by the batch job.
Do you guys have any overall or concrete recommendation how to handle this use case?