Questions tagged [kiba-etl]

36 questions
7
votes
1 answer

How to do a aggregation transformation in a kiba etl script (kiba gem)?

I want to write a Kiba Etl script which has a source from a CSV to Destination CSV with a list of transformation rules among which the 2nd transformer is an Aggregation in which operation such as select name, sum(euro) group by name Kiba ETL Script…
4
votes
2 answers

Should I use Rails for consistency? (for ETL project)

CONTEXT I'm new to Ruby and all that jazz, but I'm not new to dev. I'm taking over a project based on 2 rails/puma repositories for web & APIs. I'm building a new repository for a backend data processing app, using Kiba, that will run through…
Tristan M
  • 133
  • 1
  • 3
4
votes
1 answer

Modify a range of rows after applying transformations

Modify a range of rows after applying transformations I want to write a kiba transformation that allows me to insert the same information for an specific number of rows. In this case i have an xls file that contains subheaders, and this subheaders…
3
votes
2 answers

Transforming a table into a hash of sets using Kiba-ETL

I'm busy working through an ETL pipeline, but for this particular problem, I need to take a table of data, and turn each column into a set - that is, a unique array. I'm struggling to wrap my head around how I would accomplish this within the Kiba…
3
votes
2 answers

How to pass parameters into your ETL job?

I am building an ETL which will be run on different sources, by a variable. How can I execute my job (rake task) Kiba.run(Kiba.parse(IO.read(etl_file),etl_file)) and pass in parameters for my etl_file to then use for its sources? source…
ElderFain
  • 93
  • 5
2
votes
1 answer

Is there an obvious way to reduce rows when using Kiba?

Firstly - Thibaut, thank you for Kiba. It goes toe-to-toe with 'enterprise' grade ETL tools and has never let me down. I'm busy building an ETL pipeline that takes a numbers of rows, and reduces them down into a single summary row. I get the feeling…
2
votes
1 answer

can I run Kiba job inside rails service?

Iam running kiba job from rails service that is called inside controller. Here is current code. class KibaRunner attr_reader :job,:logger def initialize(job) @job = job @logger = Rails.logger end def run logger.info "Running…
Jurot King
  • 79
  • 7
2
votes
1 answer

Saving and loading etl pipeline from database

My current task is to make a rails application wherein users can create connections from rdbms(for mysql,pg etc.) and s3 (for csv and json). User can add etl job. An etl job can have multiple pipelines in the future but single for now. A pipeline…
Jurot King
  • 79
  • 7
2
votes
1 answer

Best practice for using Kiba as a batch process on files

We'd like to run Kiba as a batch process on a series of files. What would be the best structure to give a file mask, download the files from FTP, and then run the ETL job on each, sending a success or failure notification on a per file basis? Is…
Steve Wetzel
  • 435
  • 4
  • 9
2
votes
1 answer

Can I duplicate rows with kiba using a transform?

I'm currently using your gem to transform a csv that was webscraped from a personel-database that has no api. From the scraping I ended up with a csv. I can process it pretty fine using your gem, there's only one bit I am wondering Consider the…
Andy
  • 23
  • 2
2
votes
1 answer

Is it possible to do a Lookup use Kiba

Is it possible to do a "Lookup" with Kiba. Since it's quite a normal process in a etl. Could you show a demo if yes, thanks.
L_G
  • 209
  • 2
  • 10
2
votes
0 answers

Pass Parameters to Kiba run Method

I'm trying to use something similar to the code that's used for the kiba cli programmatically as ... filename = './path/to/script.rb' script_content = IO.read(filename) job_definition = Kiba.parse(script_content, filename) …
slabounty
  • 704
  • 11
  • 21
1
vote
1 answer

Is there a way to return some data at the end of a Kiba job?

It would be great if there was a way to get some kind of return object from a Kiba ETL run so that I could use the data in there to return a report on how well the pipeline ran. We have a job that runs every 10 minutes that processes on average 20 -…
1
vote
1 answer

How to filter data in extractor?

I've got a long-running pipeline that has some failing items (items that at the end of the process are not loaded because they fail database validation or something similar). I want to rerun the pipeline, but only process the items that failed the…
Viktor
  • 2,982
  • 27
  • 32
1
vote
1 answer

How to log "current status" of ETL job?

I'm running Kiba ETL pipeline in a rails background job. I'd like to provide some status to the user while the job is running. What would be the best way to achieve this? Can I use some variable somehow? Or should I save the status update in the…
Viktor
  • 2,982
  • 27
  • 32
1
2 3