2

My current task is to make a rails application wherein users can create connections from rdbms(for mysql,pg etc.) and s3 (for csv and json).

User can add etl job. An etl job can have multiple pipelines in the future but single for now. A pipeline has source, destination and multiple transformations.

On UI user will drag a source and destination which can be rdbms(mysql, pg etc.) or file (csv/json) and the configuration forms will differ base on type( rdbms or s3 for file)

After that it can add transformations.

Any ideas or pointers on the following

  • proper saving and loading of source,destination and transformation configs in database .
  • run etl not from etl script but from etl pipeline stored in database
Jurot King
  • 79
  • 7

1 Answers1

3

This is a somewhat complicated use-case, because you will have an extra layer of complexity compared to developers using Kiba directly. It can be done, though!

My recommendation is first to create models in your Rails database that will describe the definitions of the jobs, which each sources, transformations and destinations that you want to expose to your users, in a way that works for you.

You will have to store the credentials (DB, S3) securely (encryption is more than likely a requirement here).

Then once you have your models in place, you would build a UI that will let users edit the models.

After that, you would use the Sidekiq-compatible Kiba API in order to programmatically create jobs based on your records. Here is a pseudo-code:

job_model = MyApp::Job.find(id)

kiba_job = Kiba.parse do
  job_model.sources.each do |s|
    source s.class_name, s.config
  end
  job_model.transforms.each do |t|
    transform t.class_name, t.config
  end
  job_model.destinations.each do |d|
    transform d.class_name, d.config
  end
end

Kiba.run(kiba_job)

Obviously, you'll want to be super careful about only allowing a restricted set of classes and configurations here (whitelist the allowed setup, do not let your users provide arbitrary input).

You would also implement a predefined set of sources, transforms & destinations that you want to offer to your users.

In order to implement your S3 components, for instance, you may want to check out this SO question.

Hope this helps!

Thibaut Barrère
  • 8,845
  • 2
  • 22
  • 27
  • Great!! So kiba can handle multiple sources and destinations? – Jurot King Jan 18 '18 at 02:02
  • 1
    Absolutely. I've documented this here after your question: https://github.com/thbar/kiba/wiki/Can-Kiba-handle-multiple-sources-and-destinations%3F - please consider marking your question as answered if you are satisfied with the answer! – Thibaut Barrère Jan 18 '18 at 12:46