1

I am working on a project where I have two big tables (parent and child) in Oracle. One is having 65 Million and the other 80 Million records. In total, data from 10 columns are required from these tables and saved as one document into Elastic search. The load for two tables can be done separately also. What are two comparable options to move data (one time load) from these tables into Elastic search and out of the two which one would you recommend? The requirement is that it should be Fast and simple so that it can not only be used for one time data load but also be used in case there is a failure and the elastic search index needs to be created again from scratch.

Maarab
  • 143
  • 1
  • 14
  • This answer should help you: https://stackoverflow.com/questions/37613611/multiple-inputs-on-logstash-jdbc/37613839#37613839 – Val Feb 19 '18 at 10:23
  • So logstash is one option of doing this. Definitely is a help. I need to evaluate another solution as well for e.g. see what makes logstash a better option than the other. – Maarab Feb 19 '18 at 12:51

1 Answers1

0

As already suggested one option may be logstash: the advantage of logstash is simplicity, but it can be complicated to monitor and it can be difficult to configure if you have to transform some field during the ingestion.

One alternative can be nifi: it offers jdbc and elasticsearch plugin, and you can monitor, start and stop the ingestion directly with the web interface. It is possible with nifi to build a more complex and robust pipeline: handling exceptions, translating data types and performing data enrichment.

Dario Balinzo
  • 671
  • 5
  • 8