As a method of a COMObject
, getElementById
is built by win32com
dynamically.
On my computer, if url is http://ieeexplore.ieee.org/xpl/periodicals.jsp, it will be almost equivalent to
def getElementById(self):
return self._ApplyTypes_(3000795, 1, (12, 0), (), 'getElementById', None,)
If the url is www.baidu.com, it will be almost equivalent to
def getElementById(self, v=pythoncom.Missing):
ret = self._oleobj_.InvokeTypes(1088, LCID, 1, (9, 0), ((8, 1),),v
)
if ret is not None:
ret = Dispatch(ret, 'getElementById', {3050F1FF-98B5-11CF-BB82-00AA00BDCE0B})
return ret
Obviously, if you pass an argument to the first code, you'll receive a TypeError
. But if you try to use it directly, namely, invoke ie.Document.getElementById()
, you won't receive a TypeError
, but a com_error
.
Why win32com
built the wrong code?
Let us look at ie
and ie.Document
. They are both COMObject
s, more precisely, win32com.client.CDispatch
instances. CDispatch
is just a wrapper class. The core is attribute _oleobj_
, whose type is PyIDispatch
.
>>> ie, ie.Document
(<COMObject InternetExplorer.Application>, <COMObject <unknown>>)
>>> ie.__class__, ie.Document.__class__
(<class win32com.client.CDispatch at 0x02CD00A0>,
<class win32com.client.CDispatch at 0x02CD00A0>)
>>> oleobj = ie.Document._oleobj_
>>> oleobj
<PyIDispatch at 0x02B37800 with obj at 0x003287D4>
To build getElementById
, win32com
needs to get the type information for getElementById
method from _oleobj_
. Roughly, win32com
uses the following procedure
typeinfo = oleobj.GetTypeInfo()
typecomp = typeinfo.GetTypeComp()
x, funcdesc = typecomp.Bind('getElementById', pythoncom.INVOKE_FUNC)
......
funcdesc
contains almost all import information, e.g. the number and types of the parameters.
If url is http://ieeexplore.ieee.org/xpl/periodicals.jsp, funcdesc.args
is ()
, while the correc funcdesc.args
should be ((8, 1, None),)
.
Long story in short, win32com
had retrieved the wrong type information, thus it built the wrong method.
I am not sure who is to blame, PyWin32 or IE. But base on my observation, I found nothing wrong in PyWin32's code. On the other hand, the following script runs perfectly in Windows Script Host.
var ie = new ActiveXObject("InternetExplorer.Application");
ie.Visible = 1;
ie.Navigate("http://ieeexplore.ieee.org/xpl/periodicals.jsp");
WScript.sleep(5000);
ie.Document.getElementById("browse_keyword").value = "Computer";
Duncan has already pointed out IE's compatibility mode can prevent the problem. Unfortunately, it seems it's impossible to enable compatibility mode from a script.
But I found a trick, which can help us bypass the problem.
First, you need to visit a good site, which gives us a HTML page, and retrieve a correct Document
object from it.
ie = win32com.client.DispatchEx('InternetExplorer.Application')
ie.Visible = 1
ie.Navigate('http://www.haskell.org/arrows')
time.sleep(5)
document = ie.Document
Then jump to the page which doesn't work
ie.Navigate('http://ieeexplore.ieee.org/xpl/periodicals.jsp')
time.sleep(5)
Now you can access the DOM of the second page via the old Document
object.
document.getElementById('browse_keyword').value = "Computer"
If you use the new Document
object, you will get a TypeError
again.
>>> ie.Document.getElementById('browse_keyword')
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
TypeError: getElementById() takes exactly 1 argument (2 given)