How to read a periodically innerHTML generated element with BeautifulSoup?

Question

In intraday.pro there is an online status which is being updated repeatedly after a specific period of time. The element is being generated dynamically within a javascript innerHTML code.

I checked the html code with browser's Inspect Element and this is the code:

<div id="is_online">
   <font color="green">Online</font>
</div>

I use the code below but it returns None and doesn't find the online status.

from bs4 import BeautifulSoup
import requests

r = requests.get("http://intraday.pro/")
soup = BeautifulSoup(r.text, 'html.parser')

is_online = True
while is_online:
    items = soup.find_all("div", {"id": "is_online"})[0].decode_contents()
    if items:
        print(items)
        is_online = False

I also used:

items = soup.find_all("font")
for item in items:
    print(item.get_text())

but I couldn't find the online status again.

This is also the javascript code that generates the online status:

<script type="text/javascript">

var errtime = 0;
var ftime = 1;
var lastPair = '';

function subscribe(url) {

    var xhr = new XMLHttpRequest();

    if(ftime == 1)
        xhr.open('GET', '/script/table.php?ft=1', true);
    else
        xhr.open('GET', '/script/table.php', true);

    xhr.send();
    xhr.onreadystatechange = function()
    {
        if (xhr.readyState != 4) return;

        var isonline = document.getElementById('is_online');

        if (xhr.status != 200) {
            errtime += 1;
            if(errtime < 3)
            {
                setTimeout( subscribe('/script/table.php') , 30000);
            } else {
                // offline
                isonline.innerHTML = "<font color='red'><b>Offline</b>. Please refresh this page after few minutes</font>";
            }
        } else {
            // online
            isonline.innerHTML = "<font color='green'>online</font>";

            var result = JSON.parse(xhr.responseText);

            var stat24h = document.getElementById('stat24h');
            stat24h.innerHTML = result.stat;

            var table1 = result.table;

            var last1 = result.last;
            var tsumm = 0;
            for(var i=3;i<21;i++)
            {
                for(var j=1;j<14;j++)
                {
                    tsumm = 100*i + j;

                    var test = document.getElementById(i+"_"+j);

                    if(table1[tsumm] != null && test)
                    {
                        test.innerHTML = table1[tsumm];
                    } else {
                        if(test)
                            test.innerHTML = " ";
                    }
                }
            }

            errtime = 0;
            ftime = 2;
            subscribe('/script/table.php');

            if(lastPair != last1 && lastPair != "")
            {
                lastPair = last1;
                soundClick();
            } else {
                lastPair = last1;
            }
        }
    }
}

function soundClick() {
  var audio = new Audio();
  audio.src = '/libs/sounds/sound1.mp3';
  audio.autoplay = true;
}

</script>

Is there any solution in BeautifulSoup to be able to get the html element whenever the javascript generates it?

_ Thanks

which online status you are about ?..also check http://intraday.pro/script/table.php?ft=1 — αԋɱҽԃ αмєяιcαη, Dec 15 '19 at 00:26
At the end of the page, after the coins prediction, there is a label that indicates whether the predictions are active (`online`) or inactive (`offline`). It's needed to check this status before taking the other values into the consideration. Thanks for your comment. — mabedis, Dec 16 '19 at 15:23

score 0 · Answer 1 · answered Dec 15 '19 at 04:23

0

The problem is that bs4 is for parsing an html document that is already generated. You use requests to pull it from the web server, so it isn't really updatable as a file, you need a active session to have updating. Bs4 can still be part of your solution. I recommend using selenium, or (dryscrape haven't used), like from the answer below in order to get the javascript updated elements.

Web-scraping JavaScript page with Python

answered Dec 15 '19 at 04:23

Michael Hearn

557
6
18

Thanks for your suggestions. It is a good guide for me. I didn't know that about bs4. So I'd try selenium instead, to see if I can get the online status. – mabedis Dec 17 '19 at 20:58

αԋɱҽԃ αмєяιcαη · Accepted Answer · 2019-12-17T21:08:59.667

0

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
import time

options = Options()
options.add_argument('--headless')

driver = webdriver.Firefox(options=options)
driver.get('http://intraday.pro/')
time.sleep(3)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
status = soup.find('div', {'id': 'is_online'})
print(status.text)

driver.quit()

Output:

online

edited Dec 17 '19 at 21:08

answered Dec 16 '19 at 20:02

αԋɱҽԃ αмєяιcαη

11,825
3
17
50

Yes it works. Thanks a lot. But is there a way that the browser not to be opened during this operation? – mabedis Dec 17 '19 at 21:05
Thanks for your help. It works well and I checked the Nike mark. :) – mabedis Dec 19 '19 at 17:12

How to read a periodically innerHTML generated element with BeautifulSoup?

2 Answers2