2

I want to enter data into a Microsoft Excel Spreadsheet, and for that data to interact and write itself to other documents and webforms.

With success, I am pulling data from an Excel spreadsheet using xlwings. Right now, I’m stuck working with .docx files. The goal here is to write the Excel data into specific parts of a Microsoft Word .docx file template and create a new file.

My specific question is:

Can you modify just a text string(s) in a word/document.xml file and still maintain the integrity and functionality of its .docx encasement? It seems that there are numerous things that can change in the XML code when making even the slightest change to a Word document. I've been working with python-docx and lxml, but I'm not sure if what I seek to do is possible via this route.

Any suggestions or experiences to share would be greatly appreciated. I feel I've read every article that is easily discoverable through a google search at least 5 times.

Let me know if anything needs clarification.

Some things to note: I started getting into coding about 2 months ago. I’ve been doing it intensively for that time and I feel I’m picking up the essential concepts, but there are severe gaps in my knowledge.

Here are my tools: Yosemite 10.10, Microsoft Office 2011 for Mac

Murcielago
  • 1,030
  • 1
  • 14
  • 24

1 Answers1

1

You probably need to be more specific, but the short answer is, in principle, yes.

At a certain level, all python-docx does is modify strings in the XML. A couple things though:

  • The XML you create needs to remain well-formed and valid according to the schema. So if you change the text enclosed in a <w:t> element, for example, that works fine. Conversely, if you inject a bunch of random XML at an arbitrary point in one of the .xml parts, that will corrupt the file.

  • The XML "files", known as parts that make up a .docx file are contained in a Zip archive known as a package. You must unpackage and repackage that set of parts properly in order to have a valid .docx file afterward. python-docx takes care of all those details for you, but if you're going directly at the .docx file you'll need to take care of that yourself.

scanny
  • 26,423
  • 5
  • 54
  • 80
  • Thanks for your reply, Scanny. More specifically, I have an .xlsx file with data entered into, something like this: John Smith 1 Main St Austin, TX 55555 I pull this data using xlwings and name the variables accordingly: from xlwings import Workbook, Range wb = Workbook(‘file_path’) customerName = Range('B2').value customerStreet = Range(‘B3’).value etc. – Murcielago Jul 11 '15 at 01:15
  • That part seemed easy enough. Now the trick is writing those text strings to a pre-made .docx file in the correct place. Ideally I could create the .docx template file with keywords or some other signifier placed in the correct spot. I wonder if targeting the XML is the best route here, or if there might be better options. What do you think? – Murcielago Jul 11 '15 at 01:15
  • If the data items are *scalar*, you might want to look at making an XML template and then running them through a template renderer like Mako to produce the final XML. – scanny Jul 11 '15 at 04:42
  • I'm super close! I was able to pull the text of the document.xml file, then I used a simple replace() function to insert data in the correct placeholder. Now I'm left with a text string that I want to write to the document.xml file. I'm not exactly sure how to do this, or if this is sensible route. I'm gonna keep pushing for it. Please let me know if you have any suggestions or wisdom to share. – Murcielago Jul 11 '15 at 06:44
  • The text of the document.xml (or any other XML part) that gets written to the .docx package on Document.save() is the return value of .blob() on that part. Check out docx.opc.part.Part and XmlPart, and docx.parts.document.DocumentPart to see what inherits what from where. I think if you use docx.oxml.parse_xml() on the new XML string and then place the result in DocumentPart._element that may do the trick. Anyway, that's the right part of the code to be looking in. – scanny Jul 11 '15 at 20:38