`# #!/usr/bin/env python
import tika
tika.initVM()
from tika import parser
parsed = parser.from_file('frank_diary.docx')
print(parsed["metadata"])
print(parsed["content"])`
From this code i am able to read whole file but not page by page.
Ref. I go through this reference but it's not working. Is there any way to read PDF/DOCX using tike page by page?
I expect to read PDF/DOCX using tika page by page.
Example: Dict = [{"page_number" : 1, "content":"content"},{"page_number" : 2, "content":"content"},{"page_number" : 3, "content":"content"}]