I am trying to get a website's (habr.com) page_source
using python3
and selenium
, but there are some svg
tags that contain use
tags with xlink:href
parameter and #shadow-root (closed)
in them. That peace of html
looks like this:
My python
code is this:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://habr.com/en')
source = driver.page_source
I am doing this using just python
interpreter, not from file, so there is enough time to load all the resources.
So, the source
variable will contain all the html except this #shadow-root (closed)
part.
I tried this solution, but I guess it works only with #shadow-root (open)
.
What should I do in order to obtain the whole html source including those Shadow DOM
parts?
UPDATE:
The whole point of this is that I want to make a some kind of proxy server that will point to desired website and change all the links on the page to my localhost
. In order to make proper tests I wanted to get the source html from target website and compare it tag by tag with my localhost
's source html. But I cannot do that until I get this Shadow DOM
content. I mean I can do that but it just will not be an objective test.