I am doing a project to index a bunch of PDF documents, for this task I've chosen Elasticsearch, as it is based on Apache Lucene. Checking out several docs
and Stackoverflow questions: How to index a pdf file in Elasticsearch 5.0.0 with ingest-attachment plugin?
In terms of performance, storage space,and effectiveness what would be a better approach, to use the ingest plugin as described, or to parse the pdf and store every page, two, or three (this can be a changing parameter) and put them in a separate document ?