0

I want to retrieve the svg content generated dynamically by this html code:

index.html:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>BmHtmlGenerator</title>
</head>
<body>
<div id="svgContainer"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/bodymovin/5.4.3/lottie.min.js"></script>
<script>
    let svgContainer =  window.bodymovin.loadAnimation({
        container: document.getElementById('svgContainer'),
        renderer: 'svg',
        loop: false,
        autoplay: false,
        path: 'https://labs.nearpod.com/bodymovin/demo/markus/isometric/markus2.json',

    });
</script>
</body>
</html>

By reading many posts on the internet, I decide it will be best to use Selenium/Chromedriver with Django in this way :

from bs4 import BeautifulSoup
from django.conf import settings
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from contextlib import closing
from selenium.webdriver import Chrome # pip install selenium
from selenium.webdriver.support.ui import WebDriverWait


url = "myurl/index.html"
options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument('--disable-dev-shm-usage')
browser = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver', options=options, service_args=['--ignore-ssl-errors=true', '--ssl-protocol=any'])

max_wait = 30
browser.set_page_load_timeout(max_wait)
browser.set_script_timeout(max_wait)

browser.get(url)
browser.implicitly_wait(30)
print(browser.page_source)
browser.close()
browser.quit()

But it doesn't work, print always render the html code instead of the generated one.

I also tried with:

wait = WebDriverWait(browser, timeout=30).until(lambda x: x.find_element_by_tag_name('svg'))
print(wait)
page_source = browser.page_source
print(page_source)

But it always throws this error :

 File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/django/core/handlers/exception.py", line 35, in inner
    response = get_response(request)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/django/core/handlers/base.py", line 128, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/django/core/handlers/base.py", line 126, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/django/views/generic/base.py", line 69, in view
    return self.dispatch(request, *args, **kwargs)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/rest_framework/views.py", line 494, in dispatch
    response = self.handle_exception(exc)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/rest_framework/views.py", line 454, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/rest_framework/views.py", line 491, in dispatch
    response = handler(request, *args, **kwargs)
  File "/vagrant/src/meshine_project/meshine_api/views.py", line 468, in post
    bm.create_file()
  File "/vagrant/src/meshine_project/meshine_api/HtmlFileGenerator/BmJsonGenerator.py", line 41, in create_file
    wait = WebDriverWait(browser, timeout=20).until(lambda x: x.find_element_by_tag_name('svg'))
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 

Please help!

I just wanted to follow what's here and here, but nothing work.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
kabrice
  • 1,475
  • 6
  • 27
  • 51
  • 1
    Can you post the entire stack trace of the error? It looks like it's failing to find the `svg` element, but I can't tell for sure. And you'll want to set the `WebDriverWait(...).until(...)` line equal to a variable; `until()` returns a WebElement object if one is found before the timeout. From this WebElement, you can search its `text` attribute for your svg content, which is what was done in the first S.O. post you linked – natn2323 Nov 12 '19 at 06:09
  • Im curious why extracting `browser.page_source`. Do you need the image or just the code generated? – Naveen Nov 12 '19 at 06:13
  • @natn2323, just edited my code accordingly with the entire stack trace. – kabrice Nov 12 '19 at 06:26
  • @Naveen if I can get the code generated with the svg in, it would be very easy to extract (through beautifulsoup for e.g) the svg tag – kabrice Nov 12 '19 at 06:28
  • It looks like your "svg" tag name isn't working out. This could be due to a couple reasons, but usually the main reason is that there are multiple "svg" elements. Can you try with a more specific identifier, like a CSS Selector or XPath? – natn2323 Nov 12 '19 at 06:34
  • When going on the Browser inspector when loading my page I see just one `svg` element. Anyway, I've tried with `wait = WebDriverWait(browser, timeout=20).until(lambda x: x.find_element_by_id('svgContainer'))` and I still get the exact same error – kabrice Nov 12 '19 at 06:41
  • I'm not sure the need to use Selenium to be able to extract with `BeautifulSoup`. Using `requests` is recommended. If you need to use Selenium, you can take screenshot and store as file as explained here : https://stackoverflow.com/a/3423347/7964299 – Naveen Nov 12 '19 at 08:14

0 Answers0