3

I am building an ETL which will be run on different sources, by a variable.

How can I execute my job (rake task)

Kiba.run(Kiba.parse(IO.read(etl_file),etl_file))

and pass in parameters for my etl_file to then use for its sources?

source MySourceClass(variable_from_rake_task)

ElderFain
  • 93
  • 5

2 Answers2

3

Author of Kiba here.

EDIT: the solution below still applies, but if you need more flexibility, you can use Kiba.parse with a block to get more flexibility. See https://github.com/thbar/kiba/wiki/Considerations-for-running-Kiba-jobs-programmatically-(from-Sidekiq,-Faktory,-Rake,-...) for a detailed explanation.

Since you are using a Rake task (and not calling Kiba in a parallel environment, like Resque or Sidekiq), what you can do right now is leverage ENV variables, like this:

CUSTOMER_IDS=10,11,12 bundle exec kiba etl/upsert-customers.etl

Or, if you are using a rake task you wrote, you can do:

task :upsert_customers => :environment do
  ENV['CUSTOMER_IDS'] = [10, 11, 12].join(',)
  etl_file = 'etl/upsert-customers.etl'
  Kiba.run(Kiba.parse(IO.read(etl_file),etl_file))
end

Then in upsert-customers.etl:

# quick parsing
ids = ENV['CUSTOMER_ID'].split(',').map { |c| Integer(c) }

source Customers, ids: ids

As I stated before, this will only work for command line mode, where ENV can be leveraged safely.

For parallel executions, please indeed track https://github.com/thbar/kiba/issues/18 since I'm going to work on it.

Let me know if this properly answers your need!

Thibaut Barrère
  • 8,845
  • 2
  • 22
  • 27
0

Looks like this is tracked here https://github.com/thbar/kiba/issues/18 and already asked here Pass Parameters to Kiba run Method

Community
  • 1
  • 1
ElderFain
  • 93
  • 5