1

In intraday.pro there is an online status which is being updated repeatedly after a specific period of time. The element is being generated dynamically within a javascript innerHTML code.

I checked the html code with browser's Inspect Element and this is the code:

<div id="is_online">
   <font color="green">Online</font>
</div>

I use the code below but it returns None and doesn't find the online status.

from bs4 import BeautifulSoup
import requests

r = requests.get("http://intraday.pro/")
soup = BeautifulSoup(r.text, 'html.parser')

is_online = True
while is_online:
    items = soup.find_all("div", {"id": "is_online"})[0].decode_contents()
    if items:
        print(items)
        is_online = False

I also used:

items = soup.find_all("font")
for item in items:
    print(item.get_text())

but I couldn't find the online status again.

This is also the javascript code that generates the online status:

<script type="text/javascript">

var errtime = 0;
var ftime = 1;
var lastPair = '';

function subscribe(url) {

    var xhr = new XMLHttpRequest();

    if(ftime == 1)
        xhr.open('GET', '/script/table.php?ft=1', true);
    else
        xhr.open('GET', '/script/table.php', true);

    xhr.send();
    xhr.onreadystatechange = function()
    {
        if (xhr.readyState != 4) return;

        var isonline = document.getElementById('is_online');

        if (xhr.status != 200) {
            errtime += 1;
            if(errtime < 3)
            {
                setTimeout( subscribe('/script/table.php') , 30000);
            } else {
                // offline
                isonline.innerHTML = "<font color='red'><b>Offline</b>. Please refresh this page after few minutes</font>";
            }
        } else {
            // online
            isonline.innerHTML = "<font color='green'>online</font>";

            var result = JSON.parse(xhr.responseText);

            var stat24h = document.getElementById('stat24h');
            stat24h.innerHTML = result.stat;

            var table1 = result.table;

            var last1 = result.last;
            var tsumm = 0;
            for(var i=3;i<21;i++)
            {
                for(var j=1;j<14;j++)
                {
                    tsumm = 100*i + j;

                    var test = document.getElementById(i+"_"+j);

                    if(table1[tsumm] != null && test)
                    {
                        test.innerHTML = table1[tsumm];
                    } else {
                        if(test)
                            test.innerHTML = " ";
                    }
                }
            }

            errtime = 0;
            ftime = 2;
            subscribe('/script/table.php');

            if(lastPair != last1 && lastPair != "")
            {
                lastPair = last1;
                soundClick();
            } else {
                lastPair = last1;
            }
        }
    }
}

function soundClick() {
  var audio = new Audio();
  audio.src = '/libs/sounds/sound1.mp3';
  audio.autoplay = true;
}

</script>

Is there any solution in BeautifulSoup to be able to get the html element whenever the javascript generates it?

_ Thanks

mabedis
  • 37
  • 6

2 Answers2

0

The problem is that bs4 is for parsing an html document that is already generated. You use requests to pull it from the web server, so it isn't really updatable as a file, you need a active session to have updating. Bs4 can still be part of your solution. I recommend using selenium, or (dryscrape haven't used), like from the answer below in order to get the javascript updated elements.

Web-scraping JavaScript page with Python

Michael Hearn
  • 557
  • 6
  • 18
  • Thanks for your suggestions. It is a good guide for me. I didn't know that about bs4. So I'd try selenium instead, to see if I can get the online status. – mabedis Dec 17 '19 at 20:58
0
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
import time

options = Options()
options.add_argument('--headless')

driver = webdriver.Firefox(options=options)
driver.get('http://intraday.pro/')
time.sleep(3)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
status = soup.find('div', {'id': 'is_online'})
print(status.text)

driver.quit()

Output:

online