0

I need a preg_match_all to return urls that link to other pages. It needs to exclude images, css, js etc.

What would be the best way to match only urls to other pages?

This is what I have

/(\s|\b)(http:\/\/|https:\/\/|www\.)[\.a-z0-9]+\.[a-z0-9\/\?=]+(\s|\b)/

I need it to only match to urls within the href of a anchor tag. It shouldnt match .css, .jpg, .png etc

I have no idea how to modify it though

Dean Harber
  • 71
  • 2
  • 8
  • 1
    Depending on the regex, an assertion. Else iterating over the result list and removing the unwanted results. – mario Aug 04 '14 at 21:21
  • 1
    It sounds like you're [parsing HTML here](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php). [You shouldn't use regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) [to do this](http://programmers.stackexchange.com/questions/113237/when-you-should-not-use-regular-expressions). – scrowler Aug 04 '14 at 21:34

0 Answers0