0

How can I tranform this regex to search only in src=* and not all the links that start with http and end with jpg, png, gif? Additionally I want to get the https images. Thank you!

preg_match('!http://.+\.(?:jpe?g|png|gif)!Ui' , $content , $matches);
EnexoOnoma
  • 8,454
  • 18
  • 94
  • 179
  • 1
    *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Apr 03 '11 at 21:40
  • 1
    And for the comedy value... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – John Parker Apr 03 '11 at 21:42
  • Hi, thanks for the info but i didn't ask for this – EnexoOnoma Apr 03 '11 at 21:47
  • @middaparka: It's more than comedy... it's truth – Lightness Races in Orbit Apr 03 '11 at 21:55
  • @Tomalak The best comedy is often based on the truth. :-) – John Parker Apr 03 '11 at 21:57
  • @Tomalak it is comedy because apart from the funny rant, the answer is wrong. Regex can parse HTML. It's just not practical to do so in most cases when there is parsers readily available. – Gordon Apr 03 '11 at 22:00
  • @Gordon: Regex _cannot_ parse HTML. Regex can parse strict, specific subsets of HTML; i.e. text strings that look like HTML because they are specific examples of HTML. Regex can parse stuff out of them. – Lightness Races in Orbit Apr 03 '11 at 22:04
  • @Tomalak http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491 – Gordon Apr 03 '11 at 22:07
  • @Gordon: o.O Perhaps, then, I should add "unless you go beyond the realm of sanity and write an illegible 90-line Perl script" – Lightness Races in Orbit Apr 03 '11 at 22:12
  • @Punkis see the accepted answer to [Regex for grabbing the href attribute of an a element](http://stackoverflow.com/questions/3820666/regular-expression-for-grabbing-the-href-attribute-of-an-a-element). I take you are smart enough to substitute a for img and href for src. – Gordon Apr 03 '11 at 22:13
  • @Tomalak that's what I said right from the start: not practical ;) – Gordon Apr 03 '11 at 22:16
  • @Gordon: OK then. I think for simplicity's sake I'll just continue telling people "don't do this" (even if not quite "you can't do this"). – Lightness Races in Orbit Apr 03 '11 at 22:17

2 Answers2

0

You can try this one, that catches only the src tag and select the image url:

<img[^>]+src\s*=\s*['\"]([^'\"]+)['\"][^>]*>
pcofre
  • 3,976
  • 18
  • 27
  • Hello ! `$pattern = "]+src\s*=\s*['\"]([^'\"]+)['\"][^>]*>";` `preg_match_all($pattern, $content, $filtered);` I get this error : Warning: preg_match_all() [function.preg-match-all]: Unknown modifier ']' – EnexoOnoma Apr 03 '11 at 21:58
  • @Punkis: You need to wrap the regex in delimiters. – ridgerunner Apr 03 '11 at 22:04
0

Prefix your regex with a negative assertion to filter out the common occurences:

 (?<!src=["\']|src=)

"Prefixing" means after your ! and before the https?://..

preg_match('!(?<!src=["\']|src=)http://.+\.(?:jpe?g|png|gif)!Ui' , $content , $matches);
mario
  • 144,265
  • 20
  • 237
  • 291
  • Hi there. `preg_match_all('(?<!src=["\']|src=)', $content, $filtered);` Warning: preg_match_all() [function.preg-match-all]: Compilation failed: nothing to repeat at offset 0 – EnexoOnoma Apr 03 '11 at 22:16
  • I'm not providing tutoring in the comments. – mario Apr 03 '11 at 22:30
  • Yes I can. As counterperformance I request that you go over your /past/ questions, and utilize the ▲ upvote arrow on questions where additional help was provided in comments. Come back when you reached your vote limit. – mario Apr 03 '11 at 22:37