0

I'm trying to create a Regex to filter out HTML opening tags in PHP

So far I came up with this pattern /\<[^/>]*\>/. This pattern seems to work on https://regexr.com/49vgk.

But as soon as I copy it into PHP I get this error: PHP preg_match_all(): Unknown modifier '>'

PHP Code:

$input = '<p>This is my HTML text that I want <b>all</b> opening tags from</p>';

$regexPattern = '/\<[^/>]*\>/';
$openingTags = preg_match_all($regexPattern, $input);

So far I'm unable to figure out what is causing this issue. Mostly because I've escaped most characters.

Does someone in the StackOverflow community know what I'm doing wrong and if so could explain me what it is I'm doing wrong?

Thanks in advance.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • 1
    Welcome to Stack Overflow. Please realise that parsing HTML with regex is bad practice. The more you rely on it, the more it disappoints. – trincot Mar 11 '19 at 13:25
  • 2
    As to your question, since you use `/` as regex delimiter, you must escape that character within the regex. Now PHP thinks your regex ends just before that first `>`. – trincot Mar 11 '19 at 13:26
  • Whereas the angle brackets didn't really require escaping. – mario Mar 11 '19 at 13:28

1 Answers1

0

First of all, using regex to parse HTML is evil.

Now that this is out of the way, here is a working script:

$input = '<p>This is my HTML text that I want <b>all</b> opening tags from</p>';
$regexPattern = '/<[^\/][^>]*>/';
preg_match_all($regexPattern, $input, $matches);
print_r($matches[0]);

Array
(
    [0] => <p>
    [1] => <b>
)

Here is an explanation of the pattern <[^\/][^>]*>:

<      match an opening bracket
[^\/]  match a single character other than /
[^>]*  then match zero or more non closing bracket characters
>      match a closing bracket

As for your current errors, you have defined / to be a delimiter for the regex pattern. This means that if you want to use a literal forward slash, you therefore must escape it (as you would a regex metacharacter).

E_net4
  • 27,810
  • 13
  • 101
  • 139
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360