I'm a bit of a noob to coding so sorry if this is a dumb question, but I'm trying to write a general purpose scraper for getting some product data using the "schema.org/Product" HTML microdata.
However, I came into an issue when testing (on this page in particular where the name was being set as "Electronics" from the Breadcrumbs schema) as there were ancestor elements with different itemtypes/schema.
I first have this variable declared to check if the page has an element using the Product schema microdata.
var productMicrodata = document.querySelector('[itemscope][itemtype="https://schema.org/Product"], [itemscope][itemtype="http://schema.org/Product"]');
I then wanted to select for all elements with the itemprop attribute. e.g.
productMicrodata.querySelectorAll('[itemprop]');
The issue however is that I want to ignore any elements that have other ancestors with different itemtypes/schema attributes, as in this instance the Breadcrumbs and ListItem schema data is still being included.
I figured I would then just be able to do something like this:
productMicrodata.querySelectorAll(':not([itemscope]) [itemprop]');
However this is still returning matches for the child elements having ancestor elements with different itemscope attributes (e.g. breadcrumbs).
I'm sure I'm just missing something super obvious, but any help on how I can achieve only selecting elements that have only the one ancestor with itemtype="http://schema.org/Product"
attribute would be much appreciated.
EDIT: For clarification of where the element(s) are that I'm trying to avoid matching with are, here's what the DOM looks like on the example page linked. I'm trying to ignore the elements that have any ancestors with itemtype attributes.
EDIT 2: changed incorrect use of parent
to ancestor
. Apologies, I am still new to this :|
EDIT 4/SOLUTION: I've found a non-CSS solution for what I'm trying to achieve using the javascript Element.closest()
method. e.g.
let productMicrodata = document.querySelectorAll('[itemprop]');
let itemProp = {};
for (let i = 0; i < productMicrodata.length; i++) {
if (productMicrodata[i].closest('[itemtype]').getAttribute('itemtype') === "http://schema.org/Product" || productMicrodata[i].closest('[itemtype]').getAttribute('itemtype') === "https://schema.org/Product") {
itemProp[productMicrodata[i].getAttribute('itemprop')] = productMicrodata[i].textContent;
}
}
console.log(itemProp);