1

I have a router that I want to login to and retrieve information using Python script. Im a newbie to Python but want to learn and explore more with it. Here is what I have written so far:

from requests.auth import HTTPBasicAuth
import requests
from bs4 import BeautifulSoup

response = requests.get('http://192.168.1.1/Settings.html/', auth=HTTPBasicAuth('Username', 'Password'))
html = response.content

soup = BeautifulSoup(html, "html.parser")
print (soup.prettify())

I have two questions which are:

When I run the script the first time, I receive an authentication error. On running the script a second time it seems to authenticate fine and retrieve the HTML. Is there a better method?

With BS I want to only retrieve the code I require from the script. I cant see a tag to set BS to scrape. At the start of the HTML there are a list of variables of which I want to scrape the data for example:

var Device Pin    = '12345678';

Its much easier to retrieve the information using a single script instead of logging onto the web interface each time. It sits within the script type="text/javascript".

Is BS the correct tool for the job. Can I just scrape the one line in the list of variables?

Any help as always very much appreciatted.

zeeman
  • 185
  • 1
  • 1
  • 12

2 Answers2

0

I'd run a packet sniffer, tcpdump or wireshark, to see the interaction between your script and your router. Viewing the interactions may help determine why you're unable to authenticate on the first pass. As a workaround, run the auth section in a for loop which will try N number of times to authenticate before failing.

Regarding scraping, you may want to consider lxml with the beautiful soup parser so you can use XPath. See can we use xpath with BeautifulSoup?

XPath would allow you easily pull a single value, text, attribute, etc. from the html if lxml can parse it.

Community
  • 1
  • 1
0

As far as I know, BeautifulSoup does not handle javascript. In this case, it's simple enough to just use regular expressions

import re
m = re.search(r"var Device Pin\s+= '(\d+)'", html)
pin = m.group(1)

Regarding the authentication problem, you can wrap your call in try except to redo the call if it doesn't work the first time.

Fabricator
  • 12,722
  • 2
  • 27
  • 40