I have a web page that I need to extract information from.
There are multiple <article>
tags that need to be cycled through (I need to extract content from within them). Each article tag has many attributes, "id", "class", etc.
I have no idea how to write the Regex that I require.
What I have so far is:
<article ([a-zA-Z\s"\S][^>]*)>
This is capable of extracting all tags with their attributes, however, I don't know how to capture the information WITHIN the tags.
I feel like I need to write regex similar to: "get everything within <article ([a-zA-Z\s"\S][^>]*)>
until you reach the next </article>
tag.", but have no idea how to do that...
Thanks for your input