I have a simple program that is scraping a web site for some items. I am using Angular $http service to call the below C# method to get the markup from the page and then handling everything else with JS. Everything is working perfectly fine with the exception of a minor annoyance: a bunch of 404 errors.
The 404 errors are being displayed in the developer tools once the http get call completes. It's almost like the javascript is trying to interpret the HTML and then fails on all the get requests for the images in the browser:
What I'm trying to figure out is how to get the 404 errors to go away or fail silently (not display in the console). I'm not finding anything in my research but am assuming there is some way to handle this whether it be on the server or client side
C#
public static string GetPageSource()
{
JObject result = new JObject();
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://awebpage.html");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
result["data"] = reader.ReadToEnd();
result["success"] = true;
reader.Close();
response.Close();
}
catch (Exception ex)
{
result["data"] = ex.Message;
result["success"] = false;
}
return JsonConvert.SerializeObject(result);
}
JS
$scope.getPageSource = function () {
var ajaxProcessor = Utils.ajaxMessage('Scraping Beer Menu From Source');
ajaxProcessor.start();
$http({
method: 'POST',
url: 'AJAX/MenuHandler.aspx/GetPageSource',
contentType: 'application/json; charset=utf-8',
dataType: 'json',
data: {}
}).then(function (response) {
ajaxProcessor.stop();
var result = $.parseJSON(response.data.d);
if (result.success === false) {
Utils.showMessage('error', result.data);
} else {
var beerMenu = new BeerMenu(result.data, $scope.loggedInUser, function (beerMenu) {
$scope.buildDisplayMenu(beerMenu);
});
}
}, function (err) {
ajaxProcessor.stop();
console.log(err);
Utils.showMessage('error', err.data.Message);
});
};
UPDATE
Thanks to @dandavis, my issue is narrowed down to calling $.parseHTML within the buildDisplayMenu function (which calls buildCurrentMenu). Is there anyway to make it ignore the images or any get request?
buildCurrentMenu: function () {
var html = $.parseHTML(this.pageSource);
var menuDiv = $(html).find('.TabbedPanelsContent')[0];
var categories = $(menuDiv).find('h2');
var categegoryItems = [];
var beerArray = [];
for (var i = 0; i < categories.length; i++) {
...
}
return beerArray;
}