3

I've been asked to design a batch application that would retrieve data (specifically, a detailed list of transactions) from an external vendor on a periodic basis. We have agreed to use XML for the data exchange, but we are investigating different methods/protocols to facilitate the actual data transfer. The vendor suggested email or FTP as a means to transfer the data, but we rejected the first option out-right due to logistics and reliability concerns.

As for the second, FTP, I have always been hesitant to use FTP in a production environment where reliability is a concern. A design whereby a vendor publishes files to an FTP to be periodically pulled down seems unreliable and error-prone. My initial reaction would be to gravitate towards something like a web service (which this particular vendor may or may not even be able or willing to provide), where the data could be queried, as needed, for a specific time period.

In general, what is the best approach to use in a situation such as this? Is FTP (or SFTP) generally considered to be an acceptable option, or is there something better? Is a web-service overkill for such a simple exchange of data? Are there other viable options that I am completely overlooking?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jeffrey P
  • 431
  • 1
  • 4
  • 14
  • How large are the files you're talking about? – M.Babcock Dec 28 '11 at 21:20
  • 1
    I would not anticipate them to be very large. Probably less than 5000 transaction records encoded into XML, maybe 10 fields each with a max of 50 characters each. So allowing for a 20% inflation due to XML markup, perhaps 3M max? Overall the solution would need to be scale-able, though, in case transactions increased. – Jeffrey P Dec 29 '11 at 18:21

3 Answers3

3

File transfer presents a number of complications.

I would prefer a web service, or just HTTPS access to the file with digest/basic authentication, but for very large files, that may not be practical for them.

Another answer could be to use a shared bucket on Amazon S3, where you have read access, and they have write access. I have used that a couple of times as a poor man's secure file transfer.

I have used flavors of FTP in this way, and here are some tips if you do:

  1. Use a secure version like SFTP - FTP is just not secure for the credentials or data.

  2. Use a semaphore file to indicate when the latest file is complete and available, or make sure that when they write the file to the FTP directory, they move it in whole, so you do not access incomplete files.

  3. Make sure each file has a unique file name (timestamp, sequence number, etc.) so you can keep track of which you have processed and which you haven't. Do not reuse the file name, as you do not know when you have processed already, and could get a race condition of the file is updated as you are accessing it.

  4. Use a hash value to check for successful transfer. They could provide an MD5 hash for the file, and then you could check this against your version once you have completed copying it. I have often used the MD5 file as a semaphore as well, to both indicate a file is available, and provide a means to check the transfer was complete and correct.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Andrew Kuklewicz
  • 10,621
  • 1
  • 34
  • 42
  • Your answer makes several good points and brings up solutions to pitfalls that may occur. Although I agree that a web service would be preferable, I doubt the vendor would be willing to supply one for our consumption alone. – Jeffrey P Dec 29 '11 at 18:16
  • that is my experience as well. when a vendor suggests ftp, it means they are unwilling to do any real development (like an api), but will write script to toss a file somewhere for you. ftp is the lowest common denominator, and ends up used for this stuff all too often. – Andrew Kuklewicz Dec 31 '11 at 00:19
  • There are plenty of relatively easy HTTP(S) clients to use that can replace `ftp` commands in a script (cURL comes to mind). A WebDAV backend might also be a solution, along the lines of a webservice/HTTPS access. HTTPS is actually quite viable even for uploading large files (especially via HTTP `PUT`, plain or via WebDAV). 10s of MBs shouldn't be a problem, depending on the connection. – Bruno Jan 06 '12 at 22:28
1

You could use AS2.

However this is a push mechanism. as2 mendelson would be a free gateway software. You would set up a "channel" and everything would be transfered to you without any coding. If some problems pop up you should receive notifications.

FTP is pretty insecure. It should be reliable though.

Udo Held
  • 12,314
  • 11
  • 67
  • 93
0

I've implemented all of the solutions in previous answers and so far AS2 (using mendelson) has been the easiest and least error prone.

My observations:

  • Implementing SFTP/FTPS is straightforward and is fairly reliable with a low barrier to entry, but you end up needing to write your own polling methods (as Andrew mentioned)
  • Web services are great, but only if the vendor properly designs and documents them. I've found that smaller partners tend to whip an API together and then break it when adding functionality or add information to the transfer based upon other customer requests, but fail to update documentation to reflect new functionality. In one case this precipitated us moving to SFTP.
  • AS2 is nice as it's secure and pretty low maintenance with mendelson. Add a directory watcher on the servers output folders and you end up with realtime1 processing.

Of course at the end of the day, your vendor is going to dictate how far they're willing to go with providing connection methods and you'll need to choose the best method that they provide.

1 Realtime processing is not actually realtime processing, but a management acceptable approximation of. Your managers may differ from mine.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Robert H
  • 11,520
  • 18
  • 68
  • 110