How could I read a word document by pages (I want to create a dictionary, where the keys would be the number of pages and their respective values would be the strings corresponding to the pages: {"1": "content 1", "2": "content 2 ", ...}) with docx2python? If it is not possible with this package, with what package could I do it?
This is my code so far, it returns a whole word document as a string. Thank you.
!pip install docx2python
from docx2python import docx2python
def read_word(file_path):
"""
Function that reads a Word file and returns a string
"""
# Extract docx content, ignore images
doc = docx2python(file_path, extract_image = False)
# Get all text in a single string
output = doc.text
return output