0

I'm looking for some feedback on the below regex string to be used by PHP to detect links to PDF files in a given string.

/href=["\']?[^"\'>]+(?:\.pdf)["\']?/

This seems to do what I want in cursory tests, but I'm wondering if it needs to be more robust for edge cases. For one thing, it doesn't limit to <a> tags, just any element with a href attribute containing .pdf. Is there anything else I am missing? What about case sensitivity for .PDF?

  • See [how do I make this preg_match case insensitive](http://stackoverflow.com/questions/12411037/how-do-i-make-this-preg-match-case-insensitive). – Wiktor Stribiżew Nov 01 '16 at 14:45
  • If you want the robust solution use a parser. Then you can target elements, not worry about encapsulation, whitespace, escaping, self-closing, etc. – chris85 Nov 01 '16 at 15:15
  • Possible duplicate of [How do you parse and process HTML/XML in PHP?](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) – chris85 Nov 01 '16 at 15:16

0 Answers0