8

I'm researching ETL tools to import flat files into a database and subsequently export xml files.

Many of the tools support generating code to use in your application; however, I haven't found any that support using code already in your application. Our model is complex (relationships, validations, polymorphic associations, callbacks, etc.).

What tools are available that will allow reuse of existing code? Or am I stuck recreating (and maintaining) my model in the ETL tool?

Note: My requirements for an ETL (as opposed to bulk inserts or activerecord-import) are the transformations. We receive data from over 200 different sources in a variety of formats, level of completeness, and cleanliness. Also, the "designer" most include is more realistic for the less-technical users who will be defining the transformations.

Kyle West
  • 8,934
  • 13
  • 65
  • 97
  • Where is the transformation logic? Where do you *want* it to be? – Mark Thomas Mar 02 '12 at 14:57
  • It depends. We have a bunch built into the application already but there are others that need to be done on a per-source basis. We're talking automotive data... Our application knows 99-01, 1999-01, 1999-2001 are all the same thing, and that HND, HNDA, HONDA, and HONDA/ACURA are all the same thing. These are the tip of the iceberg. Each of our sources has a different format. One may combine years like 99-01 and another puts them in different columns. Some will put multiple makes (HONDA, BMW) in one row, others will use 2. Again, tip of the berg, but those are what the ETL tool should handle. – Kyle West Mar 02 '12 at 15:19

3 Answers3

6

ActiveWarehouse might prove useful. Initial search results make the project feel a bit old and defunct. A little digging yielded a fairly active, well documented branch of the project on GitHub: https://github.com/activewarehouse/activewarehouse-etl

Levi
  • 4,628
  • 1
  • 18
  • 15
  • It also [just went 1.0](http://www.rubyflow.com/items/7311-activewarehouse-ruby-etl-v1-0-0-rc1-is-out). I had found this a while ago, good to see it's still alive. I'm going to take a closer look. – Kyle West Mar 05 '12 at 15:32
3

Write your own. ETL is a very simple process, ruby provides enough reflection support to handle this with some simple code. ETL Tools are not really helpful here, just generate dotty files to show the data sources, flows and transformations.

I've done the same in smalltalk for a data conversion. There I've used glamour and mondrian from the MOOSE reengineering toolsuite to provide more visibility.

Stephan Eggermont
  • 15,847
  • 1
  • 38
  • 65
0

Modularize, you want the Rails app and the ETL to ask about the meaning of 'HND' from the same place. Setup an API for that.

Amala
  • 1,718
  • 1
  • 17
  • 29