-1

Please Note: This problem can be solved easily by using selenium library but I don't want to use selenium since the Host doesn't have a browser installed and not willing to.

Important: I know that render() will download chromium at first time and I'm ok with that.

Q: How can I get the page source when it's generated by JS code? For example this HP printer:

220.116.57.59

Someone posted online and suggested using:

from requests_html import HTMLSession

r = session.get('https://220.116.57.59', timeout=3, verify=False)
session = HTMLSession()
base_url = r.url
r.html.render()

But printing r.text doesn't print full page source and indicates that JS is disabled:

<div id="pgm-no-js-text">
<p>JavaScript is required to access this website.</p>

<p>Please enable JavaScript or use a browser that supports JavaScript.</p>
</div>

Original Answer: https://stackoverflow.com/a/50612469/19278887 (last part)

Lucan
  • 2,907
  • 2
  • 16
  • 30
tom
  • 1
  • 1

1 Answers1

0

Grab the config endpoints and then parse the XML for the data you want.

For example:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0"
}

with requests.Session() as s:
    soup = (
        BeautifulSoup(
            s.get(
                "http://220.116.57.59/IoMgmt/Adapters",
                headers=headers,
            ).text,
            features="xml",
        ).find_all("io:HardwareConfig")
    )
print("\n".join(c.find("MacAddress").getText() for c in soup if c.find("MacAddress") is not None))

Output:

E4E749735068
E4E74973506B
E6E74973D06B
baduker
  • 19,152
  • 9
  • 33
  • 56