How to do simple manipulation of a OOo/LibreOffice Writer document, then save

Question

I want to do a very simple bit of manipulation of a LibreOffice Writer document... then save again as the ODT file...

What might be wrong with this? If I try this I get 2 content.xmls in the zip file (ODT file)... strangely, both these (if unzipped as "content.xml" and "content_1.xml" for example) seem to contain the content as modified...

  zipfile = ZipFile( file_path, "a" )
  for zip_info in zipfile.infolist():
    contents = zipfile.read( zip_info.filename )
    if ( zip_info.filename == "content.xml" ):

      document_root = parseString( contents )

      # ... mess around with the contents DOM document...


      zipfile.writestr( zip_info, document_root.toxml() )
      zipfile.close()

I'm aware that there are various add-ins and things you can use (UNO)... but I want to keep it as simple as possible...

later

my solution: finding that there is no way to delete an element from a zip file programmatically in Python, I initially decided to take the "make a new zip" approach: Delete file from zipfile with the ZipFile Module

however, although I was able to open the resulting ODT file, and to extract all the files from it, 7Zip complained about a CRC failure, saying content.xml was now "broken". Obviously due to this brutal substitution of one "content.xml" for another.

final answer: 1) output modified DOM structure to a simple file in the same directory, calling it "content.xml":

    f = open( file_dir + '\\content.xml', "w" )
    print >>f, document_root.toxml()
    f.close()

2) harness 7zip CLI when the ODT file has been closed programmatically:

  import subprocess
  subprocess.Popen( "7z u temp.odt content.xml", cwd=file_dir, shell=True )

score 1 · Accepted Answer · answered Sep 29 '13 at 12:53

1

Depending on where the document(s) is(are) sourced from, you might want to skip messing around with the zip file and use the Flat XML OpenDocument Format (I believe it's .fodt extensions) and just manipulate the XML directly. It will mean larger file sizes, but they do compress rather well and you can always save them as .odt files when you've finished messing around with them.

answered Sep 29 '13 at 12:53

Ben

3,981
2
25
34

Thanks... definitely preferable – mike rodent Oct 10 '13 at 19:08

How to do simple manipulation of a OOo/LibreOffice Writer document, then save

1 Answers1