Disclaimer: I know that parsing HTML with regex is not the correct approach. I am actually just trying to parse text inside the HTML.
I am parsing several pages, and I am looking for prices. Here is what I have so far:
var all = document.body.querySelectorAll(":not(script)");
var regex = /\$[0-9,]+(\.[0-9]{2})?/g;
for (var i = 0; i < all.length; i++) {
var node_value = all[i].nodeValue;
for (var j = 0; j < all[i].childNodes.length; j++) {
var node_value = all[i].childNodes[j].nodeValue;
if (node_value !== null) {
var matches = node_value.match(regex);
if (matches !== null && matches.length > 0) {
alert("that's a match");
}
}
}
}
This particular code can get me prices like this:
<div>This is the current price: <span class="current">$60.00</span></div>
However, there are some prices that have the following structure:
<div>This is the current price: <sup>$</sup><span>80.00</span></div>
How could I improve the algorithm in order to find those prices? Shall I look in the first for loop for <sup>symbol</sup><span>price</span>
with regex?
Important: Once a match, I need to findout which DOM element is holding that price. The most inner element that is holding the price. So for example:
<div><span>$80.00</span></div>
I would need to say that is the element that is holding the price, not the div.