3

I want to match all "new line" type html tags (breaks and paragraphs) no matter how many and in what order they appear, so long as they appear at the beginning of a line.

This regex pattern matches the first one: ^<[Bb][Rr] ?/?>|^<[Pp]>

So, given this text <p><br>fred, it would match the first <p> but not the immediately following <br> also.

Note that I don't want to remove every one of these tags, but only those which appear at the beginning of the input line.

Xiddoc
  • 3,369
  • 3
  • 11
  • 37
jalperin
  • 2,664
  • 9
  • 30
  • 32
  • joeframbach is referring to [this famous question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags), in case you weren't aware. – Matt Ball Apr 14 '12 at 14:24

2 Answers2

4

I would also add support for white spaces between the tags:

^(?:(?:<[Bb][Rr]>\s*)|(?:<[Pp]\s*>))+
Joanna Derks
  • 4,033
  • 3
  • 26
  • 32
  • 1
    I fleshed it out a little more: ^(?:(?:<[Bb][Rr]\s*?/?>\s*)|(?:<[Pp]?/?\s*>\s*))+ – jalperin Apr 14 '12 at 16:43
  • Makes sense with the optional slashes. As for `[Pp]?` - should it have been rather `[Pp]\s*?` ? Also to make it consistent in both cases this syntax could be used `\s*?/?\s*>` – Joanna Derks Apr 15 '12 at 13:31
  • So the whole thing would look like this: `^(?:(?:<[Bb][Rr]\s*?/?\s*>\s*)|(?:<[Pp]\s*?/?\s*>\s*))+` – Joanna Derks Apr 15 '12 at 13:38
2

You need some repetition.

^(<[Bb][Rr] ?/?>|^<[Pp]>)+

Also, this would be clearer/more concise if you just used a case-insensitivity flag instead of character classes.

^(<br ?/?>|^<p>)+
Matt Ball
  • 354,903
  • 100
  • 647
  • 710
  • I believe there's a small typo. Should have the final > before the final ) like this: ^(<[Bb][Rr] ?/?>|^<[Pp]>)+ – jalperin Apr 14 '12 at 16:42