Trying to extract WordPress post ID with RegEx or XPath

Question

I've combed the internet and Stack Overflow and I can't find a combo that works. I'm trying to pull post IDs for a WP website that someone else built and I can't find a string that will work in Screaming Frog. The challenge, I think, is that the HTML elements that contain the ID include the ID itself in the class name:

<article id="post-242091" class="post-242091 page type-page status-publish hentry entry">

When I try to pull the element or the outer HTML in dev tools, it pulls the entire page, which leads me to believe that something is missing a closing tag, but I could be wrong. I've tried copying the Xpath and the full Xpath from Inspect as well and no dice. Some of the combos I've tried:

//article[@id='.*']
//article[@id='.*?']
//article[@class='.*?']
//article[@id='.*?']
//article[@id=post-.*]/class/text()

What am I doing wrong?

I tried a variety of Xpath and Regex combos, hoping that a spider could extract the inner HTML of this tag but the ID and the class are unique on each page; if I could extract the inner text of either of those elements that would be great but I've not been able to.

For "extracting" the post id you can just use `//article/@id` — Fravadona, Aug 22 '23 at 20:31
@Fravadona - that worked perfectly! Thank you so much; I'm a little embarrassed I didn't try that one! — John Alexander, Aug 22 '23 at 22:26

Trying to extract WordPress post ID with RegEx or XPath

0 Answers0