11

Assume you have a small project which on the surface looks like a good match for an ETL tool like Talend.

But assume further, that you have never used Talend and furthermore, you do not trust "visual programming" tools in general and would rather code everything the old fashioned way (text on a nice IDE!) with the help of an appropriate language & support libraries.

What are some language patterns & support libraries that could help you stay away from the ETL tool temptation/trap?

Alex R
  • 11,364
  • 15
  • 100
  • 180
  • 1
    ETL: Extract, Transform, Load. http://en.wikipedia.org/wiki/Etl – Thilo Mar 12 '10 at 01:44
  • 1
    I found this link helpful when I was trying to make that decision: [Kimball University: The Subsystems of ETL Revisited](http://www.informationweek.com/news/software/bi/202405400?queryText=subsystems+etl+revisited) – Bradford Sep 09 '11 at 14:09

5 Answers5

5

It depends on whether the deliverable is the processor or the output itself. If you just need to deliver the output, you don't need to maintain the code. If the code needs to be maintained then will it be you maintaining it or somebody else?

If somebody else needs to maintain I'd use Java or give them Talend.

If it's throwaway code, I'd use what will be easier or fun to program with.

If you need to maintain it and the processing is complex, I'd use Scala. It has:

  • some libraries to interact with databases
  • xml literals
  • parser combinators
  • interesting features on its collection packages (map, filter, groupBy, partition, ...)
  • and of course any other existing Java libraries.
Community
  • 1
  • 1
huynhjl
  • 41,520
  • 14
  • 105
  • 158
  • I've check Talend generated code... Are you sure it can be maintained after creation? – yura Sep 16 '11 at 04:53
  • @yura, I just looked briefly at *Talend* and haven't personally used it. When I meant the Talend definitions and configurations could be maintained (not necessarily the generated code). – huynhjl Sep 16 '11 at 05:40
  • Okay, I just wanted to know about your opinion whether visual languages ​​(like Talend or Pentaho) can be used for complex ETL rules that require long term support and maintenance. – yura Sep 16 '11 at 06:38
4

Check out DataExpress. It's a Scala-based, cross-database ETL toolkit.

mitalia
  • 503
  • 4
  • 10
Didia
  • 1,396
  • 1
  • 12
  • 18
4

I used to think that "visual programming" is something for people who can't program. Then I was exposed to Talend in a project, and I realized that this type of tool is exactly right for the job, when it comes to moving data from A to B, and transforming it in the process. It's component-oriented software design, by a more academic label.

I still consider myself a decent programmer who can do anything, and then some, with a text editor and a shell prompt. But I've become a big fan of Talend as well.

Full disclosure: I now work for the company :-)

drmirror
  • 3,698
  • 26
  • 26
2

I think this is a pretty good match for Rails-inspired frameworks, such as Grails on Groovy or Lift on Scala.

Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
0

Depending on the size of the DB schema, you could map everything real quick in Hibernate and just use the resulting object model to do your work (depending on what you want the ETL tool for anyways)

Eric
  • 3,284
  • 1
  • 28
  • 29