I am trying to find a specific string that contains a keyword inside a title tag in html e.g.
<title>Bla bla bla String bla bla</title>
I am unsure how to construct that beyond the starting:
\<title\>(Word Keyword)\<\/title\>
I also want to make sure if I use any wildcards regex may be able to use that the wildcard between the keyword and the doesn't inadvertently go all the way to the end of perhaps another title block in the html.
Lastly I'm trying to find a way to then
- extract the Word Keyword only even though I've capture the entire regex
- extract/keep the separately.
This is because I'll have several types of to captiure from and I want to extract both the 'Word Keyword' and the tag name it came from. Is this possible? I've looked into named groups but not sure if/how to extract after e.g.
(?P<TAG>(\<title\>|\<head\>)(?P<TERM>(Word Keyword))\<\/title\>
Naturally with any wildcard code as needed to make the above work but assuming it does I'd then want to be able to extract, after matching the string:
- title
- Bla Keyword
or
- head
- Yada Keyword