In my company, we have a continuous learning process. Every 5-10 minutes we create a new model in HDFS. Model is a folder of several files:
- model ~ 1G (binary file)
- model metadata 1K (text file)
- model features 1K (csv file) ...
On the other hand, we have hundreds of model serving instances, that need to download the model into the local filesystem once 5-10 minutes and serve from it. Currently, we are using WebFS from our service (java FileSystem client), but it probably creates a load to our Hadoop cluster, since it redirects requests to the concrete data nodes.
We consider to using HTTPFs service. Does it have a caching capability? So the first request will get a folder to service memory, and the next requests will use the already downloaded results?
What other technology/solution could be used for such use-case?