I have dug somewhat in the Mapper Attachments plugin and I find it very inflexible and unperformant. You're also mixing concerns (indexing/text extraction), which will make performance tuning more complex.
First: You will be better off installing Tika and extracting the text yourself (which will also probably improve performance as you're not sending large base64-encoded BLOBs by HTTP over to ES, and you're keeping a separate heap/process for the text extraction purpose).
Second: Is it possible to extract text by page for word/pdf files using Apache Tika?
Third: Possibly, index each page as a separate field (for example "pdf_page_1", "pdf_page_2" etc), then you will perhaps get back the field name for each hit and thus be able to retrieve the page number for your hits.
Another solution which is perhaps more flexible, is to a) index your documents with the PDF file contents all in one field (array), like pdf_contents: ["here comes page 1", "here comes page 2"], and b) create a separate index for pdf file contents, indexing each page as a separate document, including a field for the page number.
Then, do one query for your "canonical" result list, and when you have the hits, do a subsequent query on the pdf file contents index, filtering out those documents not in the result list.