Background: I am designing a software application that reads millions or much more files and either converts or just parses those files. Part of requirement is to build a scalable and distributed system so that reading and parsing can be scaled accordingly.
Basically, a minimally detailed list of filenames is one DB and Clients need to access the list to know which files need to be parsed/converted next. The files again are on another server/location. While most of the pieces are designed, one critical piece that needs a revisit is a design of feeding the file-names to different clients.
I have two options now:
Design a single service that sits next to DB and channelizes all requests to file names and feeds the clients. So in this case, Clients talk to the service(predefined protocol/format) and get the list.
Design Clients to talk directly to DB and implement synchronization/channelization within clients.
My only concern with first option is that, is that a scalable architecture/design? Has anyone dealt with such an circumstance in scalable architecture where one resource becomes a critical in scaling (In my case it could be One service feeding/servicing all clients)