I have found this question and answer already, and together they provide most of the answer to my problem, in format entirely intelligible to a complete novice!
I have an additional query to hopefully fill in some of the gaps in both my process and my understanding:
I have a series of XML files each containing all the information for a single person, instead of a single CSV containing multiple people, as the salient input. This information requires splitting and manipulation, so I have multiple streams in my Kettle transformation to reflect this (and will have a 'for each file' loop set up in the parent Job to handle multiple files).
To use the method outlined in the selected answer to import my processed data from a person file into a database, do I need to recombine my many processed datastreams into several streams within which all the data to be joined resides, or is there an alternate way of approaching this?
i.e. If I have datastreams A, B, C, D & E in my Kettle transformation, and within my database A is joined to B & C, and D to E, do I necessarily need to combine streams A, B & C into one stream and D & E into another?
Thanks in advance