How to pass parameters into your ETL job?

Question

I am building an ETL which will be run on different sources, by a variable.

How can I execute my job (rake task)

Kiba.run(Kiba.parse(IO.read(etl_file),etl_file))

and pass in parameters for my etl_file to then use for its sources?

source MySourceClass(variable_from_rake_task)

Thibaut Barrère · Accepted Answer · 2018-01-06T19:33:36.027

Author of Kiba here.

EDIT: the solution below still applies, but if you need more flexibility, you can use Kiba.parse with a block to get more flexibility. See https://github.com/thbar/kiba/wiki/Considerations-for-running-Kiba-jobs-programmatically-(from-Sidekiq,-Faktory,-Rake,-...) for a detailed explanation.

Since you are using a Rake task (and not calling Kiba in a parallel environment, like Resque or Sidekiq), what you can do right now is leverage ENV variables, like this:

CUSTOMER_IDS=10,11,12 bundle exec kiba etl/upsert-customers.etl

Or, if you are using a rake task you wrote, you can do:

task :upsert_customers => :environment do
  ENV['CUSTOMER_IDS'] = [10, 11, 12].join(',)
  etl_file = 'etl/upsert-customers.etl'
  Kiba.run(Kiba.parse(IO.read(etl_file),etl_file))
end

Then in upsert-customers.etl:

# quick parsing
ids = ENV['CUSTOMER_ID'].split(',').map { |c| Integer(c) }

source Customers, ids: ids

As I stated before, this will only work for command line mode, where ENV can be leveraged safely.

For parallel executions, please indeed track https://github.com/thbar/kiba/issues/18 since I'm going to work on it.

Let me know if this properly answers your need!

Well I wouldn't say this feels super natural, yet I've been using this technique previously with activewarehouse-etl for years. It avoids cluttering the command line with very specific switches that end up being not flexible enough. Glad I could help! — Thibaut Barrère, Oct 07 '15 at 19:19

score 0 · Answer 2 · edited May 23 '17 at 12:29

0

Looks like this is tracked here https://github.com/thbar/kiba/issues/18 and already asked here Pass Parameters to Kiba run Method

edited May 23 '17 at 12:29

Community

1
1

answered Oct 06 '15 at 06:30

ElderFain

93
5

I answered in a separate reply, since I think there is a better answer in your current context (calling from rake task). – Thibaut Barrère Oct 06 '15 at 08:07

How to pass parameters into your ETL job?

2 Answers2