0

I want to match all href attributes which contains a string without any url and the string is locally file path so I write this php code:

preg_match_all('/href="[^(http|https):\/\/](.*?)"/is',$tidyHtml,$matches);

This return an array like this:

Array
(
    [0] => tatic/css/custom.css
)

But when T see the html source the href attribute is static/css/custom.css not static/css/custom.css.

Unihedron
  • 10,902
  • 13
  • 62
  • 72
Mokhtarabadi
  • 349
  • 1
  • 3
  • 11
  • Why does this regex have square brackets? – Jon Aug 14 '14 at 15:02
  • 1
    [Stop using regex for this](http://stackoverflow.com/a/1732454/19068) – Quentin Aug 14 '14 at 15:03
  • 1
    @Unihedron — There's a difference between writing a quick, custom regex (which doesn't work and you have to ask for help with) to pull code out of some HTML, and writing a proper HTML parser that uses regex (as that answer appears to do). – Quentin Aug 14 '14 at 15:14

2 Answers2

3

You have to get rid of the square brackets. They denote a list of characters to match, not a section.

You also have to use a negative lookbehind instead of an inverted character class. See this regex which carries this match:

/(?<=href=")(?!https?:\/\/)(.*?)"/is

Here is a regex demo!

How to add src attribute to this regex?

You can simply use an OR statement:

/(?<=href="|src=")(?!https?:\/\/)(.*?)"/is

Here is an UPDATED regex demo!

foobar
  • 616
  • 2
  • 6
  • 12
Unihedron
  • 10,902
  • 13
  • 62
  • 72
2

To solve this problem you have to use negative lookahead like this

/href="(?!https?:\/\/)(.*?)"/is

It basically checks for a string containing href=" at any point in the string and peeks to check if the next characters are http with an optional s followed by ://. If this is not the case it moves the iterator forward and captures everything up to the first double quote ".

How to add src attribute to this regex? – Mokhtarabadi

This is actually quite simple by using an or-operation to check for the occurrence of href or src

/(?:href|src)="(?!https?:\/\/)(.*?)"/is

would then be the resulting regex checking for both the href and the src attribute, matching only references that have no protocol in the URL.

foobar
  • 616
  • 2
  • 6
  • 12
  • Well thanks, you are right, I got confused. But I don't know how to rollback to your revision, Unihedron. – foobar Aug 14 '14 at 15:27