1

I have straightforward scenario for the ETL job: take data from Kafka topic and put it to HBase table. In the future i'm going to add the support for some logic after reading data from a topic. I consider two scenario:

  • use Kafka Streams for reading data from a topic and further writing via native HBased driver each record
  • Use Kafka -> HBase connector

I have the next concerns about my options:

  • Is is a goo idea to write data each time it arrives in a Kafka Stream's window? - suggest that it'll downgrade performance
  • Kafka Hbase connector is supported only by third-party developer, i'm not sure about code quality of this solution and about the option to add custom aggregation logic over data from a topic.
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
pacman
  • 797
  • 10
  • 28
  • This one is more frequently updated - https://github.com/Landoop/stream-reactor/tree/master/kafka-connect-hbase And Landoop has enterprise support, if needed – OneCricketeer Jan 08 '19 at 19:05

1 Answers1

4

I myself have been trying to search for ETL options for KAFKA to HBase, however, so far my research tells me that it's a not a good idea to have an external system interaction within a KAFKA streams application (check the answer here and here). KAFKA streams are super powerful and great if you have KAFKA->Transform_message->KAFKA kind of use case, and eventually you can have KAFKA connect that will take your data from KAFKA topic and write it to a sink.

Since you do not want to use the third party KAFKA connect for HBase, one option is to write something yourself using the connect API, the other option is to use the KAFKA consumer producer API and write the app using the traditional way, poll the messages, write to sink, commit the batch and move on.

user37940
  • 478
  • 1
  • 4
  • 17