This took hours for me to figure out, so I'm posting Q&A style to hopefully save someone else the trouble.
PDI ships with a transform step called User Defined Java Expression (UDJE), which can take in fields and output the results of Java expressions on those fields. It can also access the Joda-Time library. Joda eats ISO8601's for breakfast.
The Java expression:
org.joda.time.format.ISODateTimeFormat.dateTimeNoMillis().parseDateTime(dt8601)
will take in a field dt8601
and leave you with a Joda DateTimeFormatter that can work with the date. In my case I wanted to convert to Unix time, so my full expression was:
org.joda.time.format.ISODateTimeFormat.dateTimeNoMillis().parseDateTime(dt8601).getMillis() / 1000L
For the other fields in the UDJE, "New field" is the field name you want to give to the results of this expression, and "Value type" must be Integer (which, in PDI, corresponds to Java's Long).
Separately, a couple cautionary tales about approaches that did not work:
- Using a "Modified Java Script Value" step. You'd think maybe you could use
Date.parse
, but all I got was NaN. There's a bunch of custom date/string conversion functions bundled with this step, but none handle timezones nor ISO8601s. I found the step's documentation lacking in clarity and examples and gave up.
- Using
javax.xml.bind.DatatypeConverter.parseDateTime(dt8601).getTime()
for the UDJE expression, as suggested in a Pentaho forums post and an SO Q&A. This makes use of an XML DatatypeConverter
and returns a Calendar
object which you can further process. As the poster notes though, XML dates are only "inspired by" ISO8601. This will probably function, but I for one am tired of writing date processing code and want something that strictly conforms to the international standard.