2

I'm trying to convert a file from Docx to HTML with font family, fonts size and colors in Python, I tried couple of solutions i.e Python docx, docx2html, Python Mammoth.

but none of the packages works for me. these packages are converting to HTML, but many things related to styles i.e fonts, size, and colors are skipped.

I tried to open and read docx files using Python zipfile and get XML of word file, I got all the docx information in XML, so now I'm thinking of parsing XML to HTML in Python, Maybe I can find any parser for this purpose.

Here's the snippet of code that I tried with Python docx but I'm getting None values here.

d = Document('1.docx')
d_styles = d.styles
for key in d_styles:
    print(f'{key} : {d_styles[key]}')

for XML using zipfile here's my code snippet.

docx = zipfile.ZipFile(path)
content = docx.read('word/document.xml').decode('utf-8')

Any help will be highly appreciated.

0 Answers0