3
`# #!/usr/bin/env python
import tika
tika.initVM()
from tika import parser

parsed = parser.from_file('frank_diary.docx')
print(parsed["metadata"])
print(parsed["content"])`

From this code i am able to read whole file but not page by page.

Ref. I go through this reference but it's not working. Is there any way to read PDF/DOCX using tike page by page?

I expect to read PDF/DOCX using tika page by page.

Example: Dict = [{"page_number" : 1, "content":"content"},{"page_number" : 2, "content":"content"},{"page_number" : 3, "content":"content"}]

0 Answers0