I have a requirement to develop a Document Repository which will maintain all documents related to different listed Companies. Each document will be related to a Company. It has to be REST API. Documents can be in pdf, html, word or excel format. Along with storing documents, I need to store metadata as well like CompanyID, Doc format, timestamp, doc language etc. As the number of document will grow in years to come, its important that the application is scalable.
Also need to translate non-English doc and store it translated English version in some parent-child relation which is easy for retrieval.
Any insights on the approach, libraries/jars to use and best practices and references are welcome.