0

So what I'm trying to do is to access the Microsoft Word API (which is written in VBA) from python with "pywin32" module. Specifically I need to iterate through the whole .docx file and find the location where a certain string shows up and add some text after it. I successfully fetched some paragraphs from the file with Document.Paragraphs.Items(index) and print them out, but when I try to compare it with my hard-coded string to see whether they matches or not, it always false, I did some type check to the paragraph I got from the .docx file then realized it is not a python string, that should be why it never matches with my string. Below is some code I wrote to show what is happening:

word = win32.gencache.EnsureDispatch('Word.Application')
word.Documents.Open('xxxxxxxxx.docx')
string = word.Documents(1).Paragraphs.Item(3)
print string
if string == "My Hard Coded String":
    print "True"
else:
    print "False"

So the above code snippet always gives me False even if the string that gets printed out at line 4 is exactly "My Hard Coded String", I'm reading the VBA documentation but there seems no any object or methods which has anything to do with converting the paragraph instance into python string (this might be a strange statement since VBA has nothing to do with python but...trying to summarize my question more clearly), any idea about how should I achieve this? Thanks in advance!

More Edit: Somebody has answered my question but I do not know where can I find all the objects/properties that Paragraph.Range has. I have been looking at MSDN and I don't think they lists any properties that belongs to "Range".

Boooooo
  • 157
  • 3
  • 12

1 Answers1

1

The Word object model is not written in VBA (although the documentation is targeted at VBA developers). It is written as a language-agnostic binary object API which can be accessed from multiple languages. (See here for a comparison between using VBA and Python to access the object model.)

In your case, this:

word.Documents(1).Paragraphs.Item(3)

returns an instance of a Paragraph object, which is not equivalent to a string. This makes sense because a Word paragraph is more than just a string -- it may include paragraph-level formatting, drop caps etc, character-level formatting etc.

You need to start by getting the Range object corresponding to the paragraph, via the Paragraph's Range property. The Range object:

corresponds to a contiguous area of the document

Then you need the Text property of the Range object.

Like so:

word = win32.gencache.EnsureDispatch('Word.Application')
word.Documents.Open('xxxxxxxxx.docx')
string = word.Documents(1).Paragraphs(3).Range.Text
print string
if string == "My Hard Coded String":
    print "True"
else:
    print "False"

NB. I haven't tested, but I don't think you need to explicitly call Paragraphs.Item. The object model supports a concept called default properties, which means that (in Python, at least) you can pass arguments to an object with a default property and those arguments will be passed to the property. In other words, the following are equivalent:

string = word.Documents(1).Paragraphs(3).Range.Text
string = word.Documents.Item(1).Paragraphs.Item(3).Range.Text

I think this is also why print string in your code prints out the string. Not because string is a different kind of string, but because the default property chain is as follows: Paragraph.Range.Text; and when a simple value (as opposed to an object) is expected, the chain is followed until the end, which is a string at the Text property.


Reference:

Note that (in the current documentation formatting) the left side has a list of objects, each of which can be expanded to list the specific object's properties/methods.

Zev Spitz
  • 13,950
  • 6
  • 64
  • 136
  • Okay! I think I'm kinda confused by all these information out there which targeting on VBA lol. – Boooooo Jun 14 '18 at 07:37
  • @Boooooo I've expanded my answer a bit more. – Zev Spitz Jun 14 '18 at 07:58
  • Okay! Thanks a lot about answering my question. I have tried your solution and it works, well, after I have poped the last two characters from the string I got through Paragraph.Range.Text, I have checked the Unicode of last two characters and I noticed that they are Carriage Return and Bell character, I suppose these two characters are what Microsoft used to do the formatting things? – Boooooo Jun 14 '18 at 08:10
  • Also, I do not really understand where you find out the "Text" property of "Range", I'm looking at the MSDN website and I can find the "Range", but there is no any information shows what properties "Range" has. – Boooooo Jun 14 '18 at 08:12
  • @Boooooo I'm guessing the last two characters are related to the end of the paragraph. – Zev Spitz Jun 14 '18 at 08:25
  • @CristiFati i'm reusing the OP's original code. Perhaps leave this as a comment on the question. – Zev Spitz Jun 14 '18 at 08:27