0

I need to retrive html using chrome dev tools protocol but I have sum problem and the behaviour is not how expected

I'm using https://chromedevtools.github.io/devtools-protocol/tot/DOM/#method-getOuterHTML and PyChromeDevTools for manage chrome page

from terminal

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

python script

import PyChromeDevTools

chrome = PyChromeDevTools.ChromeInterface(host='localhost',port=9222)
chrome.Network.enable()
chrome.Page.enable()
chrome.Page.navigate(url="https://stackoverflow.com")
html_code=chrome.DOM.getOuterHTML()
print(html_code)

I'm not sure that DOM.getOuterHTML its the correct command but seems good for the purpose

  • add `DOM.getDocument()` – Juraj Jan 14 '22 at 17:30
  • and you should use `chrome.wait_event("Page.loadEventFired", timeout=60)` – Juraj Jan 14 '22 at 18:27
  • @Juraj thanks, resolved! so its possible instead execute some javascript code ? I'm ltryng this method; https://chromedevtools.github.io/devtools-protocol/tot/Runtime/#method-runScript – Francisco Sour Jan 15 '22 at 09:39
  • yes, most interaction with the page can be done by executing javascript. I had to learn the DOM interface to put files into a file-input. I use the protocol with a Java library and it may work differently than the python library. did you try it without DOM.getDocument(). Wasn't waiting for the page to load enough? – Juraj Jan 15 '22 at 09:48
  • yes I forgot to set wait event timeout, so for html charging the page with the command DOM.getDocument() previusly it work. do you have en example in java for execute js ? – Francisco Sour Jan 15 '22 at 10:04
  • https://github.com/kklisura/chrome-devtools-java-client/tree/master/cdt-examples/src/main/java/com/github/kklisura/cdt/examples – Juraj Jan 16 '22 at 06:30
  • @Juraj I have found a solution but now I have another trouble :) https://stackoverflow.com/questions/70730569/executing-javascript-code-thought-chrome-dev-tools-protocol – Francisco Sour Jan 16 '22 at 13:38

1 Answers1

1

You have to wait until the page is loaded in browser. The library you use has an example for that in README.

chrome.Page.navigate(url="http://www.google.com/")
chrome.wait_event("Page.loadEventFired", timeout=60)

The DOM interface may need chrome.DOM.getDocument() before accessing the elements.

Juraj
  • 3,490
  • 4
  • 18
  • 25