How can I make a regex that matches any combination of "
", "
", and "
" that appears at the start of a line?

Question

I want to match all "new line" type html tags (breaks and paragraphs) no matter how many and in what order they appear, so long as they appear at the beginning of a line.

This regex pattern matches the first one: ^<[Bb][Rr] ?/?>|^<[Pp]>

So, given this text <p><br>fred, it would match the first <p> but not the immediately following <br> also.

Note that I don't want to remove every one of these tags, but only those which appear at the beginning of the input line.

joeframbach is referring to [this famous question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags), in case you weren't aware. — Matt Ball, Apr 14 '12 at 14:24

score 4 · Accepted Answer · answered Apr 14 '12 at 14:11

4

I would also add support for white spaces between the tags:

^(?:(?:<[Bb][Rr]>\s*)|(?:<[Pp]\s*>))+

answered Apr 14 '12 at 14:11

Joanna Derks

4,033
3
26
32

1

I fleshed it out a little more: ^(?:(?:<[Bb][Rr]\s*?/?>\s*)|(?:<[Pp]?/?\s*>\s*))+ – jalperin Apr 14 '12 at 16:43
Makes sense with the optional slashes. As for `[Pp]?` - should it have been rather `[Pp]\s*?` ? Also to make it consistent in both cases this syntax could be used `\s*?/?\s*>` – Joanna Derks Apr 15 '12 at 13:31
So the whole thing would look like this: `^(?:(?:<[Bb][Rr]\s*?/?\s*>\s*)|(?:<[Pp]\s*?/?\s*>\s*))+` – Joanna Derks Apr 15 '12 at 13:38

Matt Ball · Answer 2 · 2012-04-14T16:56:25.520

2

You need some repetition.

^(<[Bb][Rr] ?/?>|^<[Pp]>)+

Also, this would be clearer/more concise if you just used a case-insensitivity flag instead of character classes.

^(<br ?/?>|^<p>)+

edited Apr 14 '12 at 16:56

answered Apr 14 '12 at 14:07

Matt Ball

354,903
100
647
710

I believe there's a small typo. Should have the final > before the final ) like this: ^(<[Bb][Rr] ?/?>|^<[Pp]>)+ – jalperin Apr 14 '12 at 16:42

How can I make a regex that matches any combination of "", "", and "" that appears at the start of a line?

2 Answers2

How can I make a regex that matches any combination of "
", "
", and "
" that appears at the start of a line?