Taking cues from David's comment above, I used this question as a basis for displaying cookies using Google Apps Script:
var _URL = "https://www.dibbs.bsm.dla.mil/Rfq/RFQRecs.aspx?TypeSrch=cq&category=nsn&value=7110-00-001-2667";
function getData(_URL) {
var opt = {
"method" : "post",
"User-Agent" : "Mozilla/5.0",
"Accept" : "text/html,application/xhtml+xml,application/xml",
"Accept-Language" : "en-US,en;q=0.5",
"followRedirects" : true
};
var response = UrlFetchApp.fetch(url,opt);
var headers = response.getAllHeaders();
var sessioncookie = headers['Set-Cookie'];
Logger.log(sessioncookie);
opt = {
"method" : "get",
"User-Agent" : "Mozilla/5.0",
"Accept" : "text/html,application/xhtml+xml,application/xml",
"Accept-Language" : "en-US,en;q=0.5",
"headers" : {
"Cookie" : sessioncookie
},
"followRedirects" : true
};
var content = UrlFetchApp.fetch(url, opt).getContentText();
Logger.log("File size: " + content.length);
...
}
This returned a cookie called "ASP.NET_SessionId", which looked like this:
ASP.NET_SessionId=y0p5fp1cjl040p1ncr20h2gc; path=/; secure; HttpOnly
I passed this cookie back in the following HTTP request, hoping to get further. But I still wasn't able to bypass the warning page. In the process of troubleshooting, I got used to going into my Chrome settings and clearing cookies for this site, but then noticed that this particular site had set not one, but three different cookies, including one called "DIBBSDoDWarning" with its content simply being the string "AGREE". Hmm, could that do something?
Experimenting a bit, I found that I could simply send just this one cookie from the outset in a single request to get the page I wanted.
var opt = {
"method" : "get",
"User-Agent" : "Mozilla/5.0",
"Accept" : "text/html,application/xhtml+xml,application/xml",
"Accept-Language" : "en-US,en;q=0.5",
"headers" : {
"Cookie" : "DIBBSDoDWarning=AGREE; path=/; secure; HttpOnly"
},
"followRedirects" : true
};
var content = UrlFetchApp.fetch(url, opt).getContentText();
There is no IMPORTXML
support in Google Apps Script to easily scrape a webpage using Xpath, so what still remains to be done is figure out how do this more elegantly than I'm doing now. I tried using XmlService.parse()
to return a Document
, but the script consistently fails when it reaches this point (not sure if this page is malformed), so my fallback was a simple string search, attempting to simply get the number of results returned:
var pos = content.search('id="ctl00_cph1_lblRecCount"')
var recordCount = content.substr(pos+40,22).match(/\d+/).join();
Will update if I figure out a good general Xpath-oriented solution.