18
#-*- coding:utf-8 -*-
import win32com.client, pythoncom
import time

ie = win32com.client.DispatchEx('InternetExplorer.Application.1')
ie.Visible = 1
ie.Navigate('http://ieeexplore.ieee.org/xpl/periodicals.jsp')
time.sleep( 5 )

ie.Document.getElementById("browse_keyword").value ="Computer"
ie.Document.getElementsByTagName("input")[24].click()

import win32com.client, pythoncom
import time

ie = win32com.client.DispatchEx('InternetExplorer.Application')
ie.Visible = 1
ie.Navigate('www.baidu.com')
time.sleep(5)

print 'browse_keword'
ie.Document.getElementById("kw").value ="Computer"
ie.Document.getElementById("su").click()
print 'Done!'

When run the first section code,it will popup:

ie.Document.getElementById("browse_keyword").value ="Computer"
TypeError: getElementById() takes exactly 1 argument (2 given)

And the second section code runs ok. What is the difference that making the result different?

Luke Woodward
  • 63,336
  • 16
  • 89
  • 104
Tre Mi
  • 181
  • 1
  • 3
  • What happens if you take away the `.1` from `ie = win32com.client.DispatchEx('InternetExplorer.Application.1')`? – agf Mar 22 '12 at 05:37
  • I've removed ".1" from that, the situation remains. Now when the script runs, there is only one browser running. But "TypeError: getElementById() takes exactly 1 argument (2 given)" remians. – Tre Mi Mar 22 '12 at 05:48
  • I get the error even with the second code example. – wRAR Feb 06 '13 at 22:01
  • Are you using a 32 or 64-bit OS? And what about your python build? – planestepper Feb 10 '13 at 02:56
  • Also, have you tried running `help(win32com.client.DispatchEx('InternetExplorer.Application').Document.getElementById)`? This might give you a clue. – planestepper Feb 10 '13 at 03:24
  • @leon 64bit OS, 32bit Python, 32bit IE is opened. The help returns only "getElementById(self) method of win32com.client.CDispatch instance" which is wrong of course. – wRAR Feb 10 '13 at 13:47
  • @wRAR - self on the method only? Try running `help(win32com.client.DispatchEx('InternetExplorer.Application').Document.getEle‌​mentById())`. We may be looking at a constructor here. – planestepper Feb 10 '13 at 14:54
  • @leon getEle‌mentById() will throw `OLE error 0x80020101` – wRAR Feb 10 '13 at 18:25
  • @wRAR, the only thing I can think of: test the code under 32-bit Windows, using IE 32-bit and Python 32 bit, and then run the same with all matching 64-bit (though I'm unsure about win32com). Apart from that, I have no ideas. – planestepper Feb 10 '13 at 20:06
  • Mmm... or try with Chrome/Firefox! May be an IE oddity? – Alex Feb 11 '13 at 11:08
  • One other thing - w3c validator shows 299 errors in the markup on the IEEE page, and only 9 errors on the Baidu page (about normal for a real website). Maybe the bad markup is screwing with the method here? Though it would be an odd error message if that were the case... – Alex Feb 11 '13 at 11:12
  • @Alex I don't think Chrome/Firefox expose the same COM methods as IE. – wRAR Feb 11 '13 at 14:06

4 Answers4

5

The difference between the two cases has nothing to do with the COM name you specify: either InternetExplorer.Application or InternetExplorer.Application.1 result in the exact same CLSID which gives you an IWebBrowser2 interface. The difference in runtime behaviour is purely down to the URL you retrieved.

The difference here may be that the page which works is HTML whereas the other one is XHTML; or it may simply be that errors in the failing page prevent the DOM initialising properly. Whichever it appears to be a 'feature' of the IE9 parser.

Note that this doesn't happen if you enable compatibility mode (after the second line below I clicked the compatibility mode icon in the address bar):

(Pdb) ie.Document.DocumentMode
9.0
(Pdb) ie.Document.getElementById("browse_keyword").value
*** TypeError: getElementById() takes exactly 1 argument (2 given)
(Pdb) ie.Document.documentMode
7.0
(Pdb) ie.Document.getElementById("browse_keyword").value
u''

Unfortunately I don't know how to toggle compatibility mode from a script (the documentMode property is not settable). Maybe someone else does?

The wrong argument count is, I think, coming from COM: Python passes in the arguments and the COM object rejects the call with a misleading error.

Duncan
  • 92,073
  • 11
  • 122
  • 156
5

As a method of a COMObject, getElementById is built by win32com dynamically.
On my computer, if url is http://ieeexplore.ieee.org/xpl/periodicals.jsp, it will be almost equivalent to

def getElementById(self):
    return self._ApplyTypes_(3000795, 1, (12, 0), (), 'getElementById', None,)

If the url is www.baidu.com, it will be almost equivalent to

def getElementById(self, v=pythoncom.Missing):
    ret = self._oleobj_.InvokeTypes(1088, LCID, 1, (9, 0), ((8, 1),),v
            )
    if ret is not None:
        ret = Dispatch(ret, 'getElementById', {3050F1FF-98B5-11CF-BB82-00AA00BDCE0B})
    return ret

Obviously, if you pass an argument to the first code, you'll receive a TypeError. But if you try to use it directly, namely, invoke ie.Document.getElementById(), you won't receive a TypeError, but a com_error.

Why win32com built the wrong code?
Let us look at ie and ie.Document. They are both COMObjects, more precisely, win32com.client.CDispatch instances. CDispatch is just a wrapper class. The core is attribute _oleobj_, whose type is PyIDispatch.

>>> ie, ie.Document
(<COMObject InternetExplorer.Application>, <COMObject <unknown>>)
>>> ie.__class__, ie.Document.__class__
(<class win32com.client.CDispatch at 0x02CD00A0>,
 <class win32com.client.CDispatch at 0x02CD00A0>)
>>> oleobj = ie.Document._oleobj_
>>> oleobj
<PyIDispatch at 0x02B37800 with obj at 0x003287D4>

To build getElementById, win32com needs to get the type information for getElementById method from _oleobj_. Roughly, win32com uses the following procedure

typeinfo = oleobj.GetTypeInfo()
typecomp = typeinfo.GetTypeComp()
x, funcdesc = typecomp.Bind('getElementById', pythoncom.INVOKE_FUNC)
......

funcdesc contains almost all import information, e.g. the number and types of the parameters.
If url is http://ieeexplore.ieee.org/xpl/periodicals.jsp, funcdesc.args is (), while the correc funcdesc.args should be ((8, 1, None),).

Long story in short, win32com had retrieved the wrong type information, thus it built the wrong method.
I am not sure who is to blame, PyWin32 or IE. But base on my observation, I found nothing wrong in PyWin32's code. On the other hand, the following script runs perfectly in Windows Script Host.

var ie = new ActiveXObject("InternetExplorer.Application");
ie.Visible = 1;
ie.Navigate("http://ieeexplore.ieee.org/xpl/periodicals.jsp");
WScript.sleep(5000);
ie.Document.getElementById("browse_keyword").value = "Computer";

Duncan has already pointed out IE's compatibility mode can prevent the problem. Unfortunately, it seems it's impossible to enable compatibility mode from a script.
But I found a trick, which can help us bypass the problem.

First, you need to visit a good site, which gives us a HTML page, and retrieve a correct Document object from it.

ie = win32com.client.DispatchEx('InternetExplorer.Application')
ie.Visible = 1
ie.Navigate('http://www.haskell.org/arrows')
time.sleep(5)
document = ie.Document

Then jump to the page which doesn't work

ie.Navigate('http://ieeexplore.ieee.org/xpl/periodicals.jsp')
time.sleep(5)

Now you can access the DOM of the second page via the old Document object.

document.getElementById('browse_keyword').value = "Computer"

If you use the new Document object, you will get a TypeError again.

>>> ie.Document.getElementById('browse_keyword')
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
TypeError: getElementById() takes exactly 1 argument (2 given)
nymk
  • 3,323
  • 3
  • 34
  • 36
  • What about `CastTo(ie.Document.parentWindow.document, "IHTMLDocument3")` (w/o cast it returned ihtmldocument2 in ie11) – Winand Feb 11 '14 at 19:49
  • UPD. also we need `gencache.EnsureModule('{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}', 0, 4, 0)` before casting – Winand Feb 11 '14 at 20:03
2

I just got this issue when I upgraded to IE11 from IE8.

I've only tested this on the getElementsByTagName function. You have to call the function from the Body element.

#-*- coding:utf-8 -*-
import win32com.client, pythoncom
import time

ie = win32com.client.DispatchEx('InternetExplorer.Application.1')
ie.Visible = 1
ie.Navigate('http://ieeexplore.ieee.org/xpl/periodicals.jsp')
time.sleep( 5 )

ie.Document.Body.getElementById("browse_keyword").value ="Computer"
ie.Document.Body.getElementsByTagName("input")[24].click()
0

Calls to methods of instances in Python automatically adds the instance as first argument - that's why you have to explicitly write the 'self' argument inside methods.

For example, instance.method(args...) is equal to Class.method(instance, args...).

From what I see the programmer must have forgotten to write the self keyword, resulting in breaking the method. Try to look inside the library code.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
unddoch
  • 5,790
  • 1
  • 24
  • 37