Similar regular expression in same line

Question

I have requirement of following reg-ex pattern:

Sample string :

<html> a test of  strength and <h1> valour </h1> for <<<NOT>>> faint hearted <b> BUT </b> protoganist having their characters <<<CARVED>>> out of gibralter <b> ROCK </b>

This above is single string in which I want to strip out every HTML tag and retain <<<xyz>>> .

My attempt:

(^|\n| )<[^>]*>(\n| |$)

Can someone please critically review this ?

Have you considered using an HTML parser? It would make this task trivial. If you have a good reason not to, please tell us which environment (language, tool) you're running your regex in, as possible solutions will depend on that — Aaron, Jan 21 '20 at 14:21
No, I have not considered it as am un-aware of it. My language is vbscript/vba . Any references for HTML Parser using vbscript ? — IrateINWIT, Jan 21 '20 at 14:38
I'm not familiar with vbscript, but [this question](https://stackoverflow.com/questions/16629228/extract-text-between-html-tags) seems to address that topic well — Aaron, Jan 21 '20 at 14:40
I'm voting to close this question as off-topic because it belongs on [Code Review](https://codereview.stackexchange.com/). — user692942, Jan 21 '20 at 14:56
Obligatory [parsing HTML with RegEx warning](https://stackoverflow.com/a/1732454/1014587). — Mast, Jan 21 '20 at 15:05

Matt Cremeens · Accepted Answer · 2020-01-21T15:04:25.340

1

This is what I've come up with. It uses lookbehinds to make sure you identify hmtl tags by what will precede and follow them without actually including them in the match. The point is to look for < and > only if they are followed or preceded by spaces or letters (not other < or >). Is this what you are after or did I misread you?

(?=([ A-z]?))<{1}\/?[A-z1-6]+>{1}(?=[^>])

edited Jan 21 '20 at 15:04

answered Jan 21 '20 at 14:32

Matt Cremeens

4,951
7
38
67

It's because the last `` ended the string. I made an edit to address this. My apologies. – Matt Cremeens Jan 21 '20 at 15:04
Made final modification as follows ` (?=([ A-z]?))<{1}\/?[A-z1-6]+>{1}(?=[ A-z]|$) ` – IrateINWIT Jan 21 '20 at 15:12

Similar regular expression in same line

1 Answers1