I have a document like this document, I want to convert the DOCX file into HTML file by using Python package docx:
docx.convert_to_html(input)
.
I am wondering is there any method or Python package that I can extract the position from each line? The position represents the boundary of each line in relative location in a page.
I want my ultimate HTML result looks like this:
<line b="38" id="line_0" l="616" r="795" style="white-space:nowrap; position:absolute; left:616px;top:14px; font-size:19.0px; " t="14">DOC
# 2019-0046594<br></line>
Regardless of the format, I need the four position values of "l, r, b, t".