I have straightforward scenario for the ETL job: take data from Kafka topic and put it to HBase table. In the future i'm going to add the support for some logic after reading data from a topic. I consider two scenario:
- use Kafka Streams for reading data from a topic and further writing via native HBased driver each record
- Use Kafka -> HBase connector
I have the next concerns about my options:
- Is is a goo idea to write data each time it arrives in a Kafka Stream's window? - suggest that it'll downgrade performance
- Kafka Hbase connector is supported only by third-party developer, i'm not sure about code quality of this solution and about the option to add custom aggregation logic over data from a topic.