0

I am trying to render content of a word document file (docx) stored in google drive with Django templating. The word document file (docx) is the template with django variables. Converting the file to google docs format would let the docx file loose its font and style formatting hence i am trying to implement the following steps in google app engine

  1. Download the docx file using its downloadUrl from google drive
  2. Pass the downloaded file into the python-docx module to extract the text
  3. Pass the text extracted into Django for it to render the Django variables
  4. Write the text back into docx using the python-docx
  5. Finally upload the docx file into another google drive account.

I am having problem trying to pass the downloaded file into python-docx as implemented here
Below is my codes in google app engine

    downloadUrl = searchResult.get('items')[1]['downloadUrl']
    if downloadUrl:
      resp, tempContent = drive_service._http.request(downloadUrl)
      if resp.status == 200:
        f  = StringIO.StringIO(tempContent)
        document = Document(f)
        para = document.paragraphs()
        print para
        f.close()

The above code gave the following error:

      para = document.paragraphs()
      TypeError: 'list' object is not callable

This is my codes for rending the extracted text in Django templating that works

        myTemplate = Template(tempContent)
        c = Context({ 
                     "salutation": "William", 
                     "inventionTitle":"Biometric KeyLock"
                     })
        fullContent =  myTemplate.render(c)

The mimetype for the downloaded file is:

application/vnd.openxmlformatsofficedocument.wordprocessingml.document

My problem is, i don't know how to process the downloaded file. I want to replace the placeholders/variables in word docx stored in google drive without loosing the formatting then uploaded it back into google drive.

If there is any better way of implementing this, kindly let me know.

Thank you.

  • The error message tells you all you need to know: `document.paragraphs` is not a method, don't try to call it. But you'll have a whole load more to fix before you can make this work: just off the top of my head, how are you expecting to get the data back into the word doc in the right place? – Daniel Roseman Oct 18 '14 at 20:52
  • Thanks Daniel, but according to the documentation >>> def paragraphs(self): """ A list of |Paragraph| instances corresponding to the paragraphs in the document, in document order. Note that paragraphs within revision marks such as ```` or ```` do not appear in this list. """ return self._document_part.paragraphs` – William Osilaja Boampong Oct 21 '14 at 13:04
  • Link to documentation for [python-docx] (https://python-docx.readthedocs.org/en/latest/_modules/docx/api.html#Document.paragraphs) – William Osilaja Boampong Oct 21 '14 at 13:20

1 Answers1

0

An error like...

  TypeError: 'list' object is not callable

... generally means that you have a list ([]), which is not a callable object, meaning you can't put parentheses after it to invoke it:

>>> []()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable

It's likely that that object is the data payload you want. Try removing the parentheses and let us know what you get!

para = document.paragraphs

Each paragraph is likely an object with which you can manipulate, i.e., merge/render Django/Jinja2 template w/context, etc. For example, if you want the text, you may have to extract it with .text, as specified in the Paragraph object docs page:

for para in document.paragraphs:
    print(para.text)

I don't have experience with python-docx, but it would be cool if you could just do something like this:

for para in document.paragraphs:
    myTemplate = Template(para.text)
    c = Context({ 
                 "salutation": "William", 
                 "inventionTitle":"Biometric KeyLock"
                 })
    para.text = myTemplate.render(c)

In reality however, that's probably not going to fly, because you're likely to have various text formatting in just one paragraph, meaning you're probably going to need to start investigating "runs," regions of text with a common set of properties. Also see the docs page on Runs.

To preserve the formatting, you may have to do look at an entire document as a whole and perform individual search-n-replace for the template variables. While this question doesn't involve the Google Slides API, the way its text is structured within documents is similar to Word & Google Docs, thus its text concepts guide may be a useful reference.

Finally, a heads-up that Drive API v2 is no longer the newest API version. It's on v3 now where downloadUrl is deprecated. To see the alternative, check the Drive API v2-v3 migration guide. To see some actual v3 code that you'll likely use and just tweak both the source & destination MIMEtypes, check out my "Exporting Google Sheets files as CSV" blog post.

Community
  • 1
  • 1
wescpy
  • 10,689
  • 3
  • 54
  • 53