how to extract the titles of a document .odt in Libreoffice - Python

Question

I have a document in the .odt file,

I just want to extract the titles of this document :

in other words , I want to extract the "sentenses " or in others words "the "line" which are in bold in this documentdocument.odt :

my code :

from odf.opendocument import load
from odf import text
from odf import teletype
doc = load('result.odt')

for paragraph in doc.getElementsByType(text.Span):
    print (paragraph.getAttribute('stylename'))
    print(paragraph)

the document is in attached file :

Thank you for your help !

Regards,

Perhaps there was a confusion - despite the **.odt** extension, the **result.odt** file is actually not a text document, but a Draw (most likely a reverse PDF conversion). Yes, you can still get texts from this document, but the algorithm will be different. Therefore, let's clarify - what kind of document should your script process, Writer or Draw? — JohnSUN, Feb 03 '22 at 10:51
Yes, the task looks difficult. You will need to get .getDrawPages() from the loaded doc, loop through them all, get individual elements in each of the pages using .getByIndex() from 0 to .getCount()-1 (not sure if there is an iterator here, need to check), get content from each of them using .getString()... The most difficult part of the task is to determine that the read text is a header (by .CharHeight?). And find the separate parts of the headings if they were split into separate lines (in this case, the separate parts will be in different text frames). *(If I were you, I would refuse)* — JohnSUN, Feb 03 '22 at 14:50
It looks like you are using a special library called `odf`. Is that `odfpy` from https://github.com/eea/odfpy? That should be included in the tags, because presumably you want any answers to use this library rather than the LibreOffice UNO API, which I think is what @JohnSUN was expecting. — Jim K, Feb 03 '22 at 15:49
yes, @JimK, I m looking for a solution using this library (odfpy). What do you mean by "that should be included in the tags" ? how to extract the relevant informations ? thank you. — dkk, Feb 05 '22 at 18:06

how to extract the titles of a document .odt in Libreoffice - Python

0 Answers0