I would initially start with the simplest possible thing, which is like you said, using external files then calling Kiba on each one. E.g. :
- Build a rake task to download the files locally (and remove them from the FTP, or at least move them to a separate folder to avoid double-processing), inside a well-known folder which will act as an inbox. See here for interesting links on how to do that.
- Then build another rake task to iterate over the inbox folder and process a given file (using
Dir[pattern].each
).
Make sure to use a helper such as:
def system!(command)
fail "Command #{command} failed" unless system(command)
end
to make sure you detect failures in execution when making system calls.
For your ETL file itself, you would use one at_exit
block to capture failure and notify accordingly (see example here with Bugsnag, and a post_process
block to capture success and notify in that case.
This will definitely work and is simple, that said there are other possibilities, such as a single ETL file which will download files in a pre_process
block, then have a source which will yield one filename per downloaded file, and maybe a transform which could itself call kiba
on the command line, or even more advanced solutions.
I would stick to the simplest possible solution to get started, as always!