I'm new to Python, I wanted to web scrape my modem and reach its DOM so then I can collect some status, but I don't know how it's done, is it possible to web scrape this local device through its IP address, 192.168.1.1?
And another thing is that, when you open up that IP, it shows up this alert message to log in, I don't know how should I fill it with scrapy
This is what I've written, but it's not working, the res.html file gets created but it's empty
import scrapy
class ScrapperSpider(scrapy.Spider):
handle_httpstatus_list = [401]
name = "scrapper"
start_urls = ["http://192.168.1.1/"]
auth = "Basic YWRtaW46YWRtaW4="
def parse(self, response):
return scrapy.Request(
"http://192.168.1.1/",
headers={'Authorization': self.auth},
callback=self.after_login
)
def after_login(self, response):
with open('res.html', 'wb') as f:
f.write(response.xpath('//*[@id="box_header"]/tbody/tr[1]/td').extract())
I got the response content with response.text and here it is:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<META HTTP-EQUIV="pragma" CONTENT="no-cache">
<title>ADSL Router</title>
<script language="javascript" src="util.js"></script>
<script>
function closeWindow(){
var currBrowser;
currBrowser = GetBrowserOS();
switch(currBrowser)
{
case "msiewin":
case "msiemac":
case "netslin":
window.opener = self;
window.close();
break;
case "netswin":
case "firelin":
case "firewin":
case "firemac":
window.open('','_parent','');
window.close();
break;
default:
window.opener = self;
window.close();
break;
}
}
function op() {}
</script>
</head>
<blockquote>
<frameset rows="0,*" frameborder="0" framespacing="0">
<frame name="fPanel" src="" scrolling="auto" marginwidth="0" marginheight="0">
<frame name="main" src="internet.htm">
<noframes>
<body bgcolor="#008080">
<p>This page uses frames, but your browser doesn't support them.</p>
</body>
</noframes>
</frameset>
</blockquote>
</html>
I don't know how can I be sure that I passed the authorization or not, I don't even know if I'm sending the right request, I've inspected the network tab while I was logging in, but there were no POST request in any of the files, the only part that seems to be related to logging in was the Authorization: Basic YWRtaW46YWRtaW4= in the request headers, But I think I must be logged in right? because the response has these contents
I used the codes of this question btw: Scrapy to bypass an alert message with form authentication
EDIT: Nevermind, I think it actually logs in, because I inspected the contents of a request to http://192.168.1.1/internet.htm and it has the content of the first page of the modem, Now I should see how can I switch to other tabs and etc...
EDIT: there is no need to switch tabs... I just hovered the mouse on the page that I needed, and it's located at http://192.168.1.1/adslconfig.htm, I sent a request to there and I got everything that I needed in the response.text
Done!