Very simple editing in a regular expression pattern

Question

How can I tranform this regex to search only in src=* and not all the links that start with http and end with jpg, png, gif? Additionally I want to get the https images. Thank you!

preg_match('!http://.+\.(?:jpe?g|png|gif)!Ui' , $content , $matches);

*(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) — Gordon, Apr 03 '11 at 21:40
And for the comedy value... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — John Parker, Apr 03 '11 at 21:42
@Tomalak it is comedy because apart from the funny rant, the answer is wrong. Regex can parse HTML. It's just not practical to do so in most cases when there is parsers readily available. — Gordon, Apr 03 '11 at 22:00
@Gordon: Regex _cannot_ parse HTML. Regex can parse strict, specific subsets of HTML; i.e. text strings that look like HTML because they are specific examples of HTML. Regex can parse stuff out of them. — Lightness Races in Orbit, Apr 03 '11 at 22:04
@Tomalak http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491 — Gordon, Apr 03 '11 at 22:07
@Gordon: o.O Perhaps, then, I should add "unless you go beyond the realm of sanity and write an illegible 90-line Perl script" — Lightness Races in Orbit, Apr 03 '11 at 22:12
@Punkis see the accepted answer to [Regex for grabbing the href attribute of an a element](http://stackoverflow.com/questions/3820666/regular-expression-for-grabbing-the-href-attribute-of-an-a-element). I take you are smart enough to substitute a for img and href for src. — Gordon, Apr 03 '11 at 22:13
@Tomalak that's what I said right from the start: not practical ;) — Gordon, Apr 03 '11 at 22:16
@Gordon: OK then. I think for simplicity's sake I'll just continue telling people "don't do this" (even if not quite "you can't do this"). — Lightness Races in Orbit, Apr 03 '11 at 22:17

score 0 · Answer 1 · answered Apr 03 '11 at 21:52

0

You can try this one, that catches only the src tag and select the image url:

<img[^>]+src\s*=\s*['\"]([^'\"]+)['\"][^>]*>

answered Apr 03 '11 at 21:52

pcofre

3,976
18
27

Hello ! `$pattern = "]+src\s*=\s*['\"]([^'\"]+)['\"][^>]*>";` `preg_match_all($pattern, $content, $filtered);` I get this error : Warning: preg_match_all() [function.preg-match-all]: Unknown modifier ']' – EnexoOnoma Apr 03 '11 at 21:58
@Punkis: You need to wrap the regex in delimiters. – ridgerunner Apr 03 '11 at 22:04

mario · Accepted Answer · 2011-04-03T23:09:33.507

0

Prefix your regex with a negative assertion to filter out the common occurences:

 (?<!src=["\']|src=)

"Prefixing" means after your ! and before the https?://..

preg_match('!(?<!src=["\']|src=)http://.+\.(?:jpe?g|png|gif)!Ui' , $content , $matches);

edited Apr 03 '11 at 23:09

answered Apr 03 '11 at 22:06

mario

144,265
20
237
291

Hi there. `preg_match_all('(?<!src=["\']|src=)', $content, $filtered);` Warning: preg_match_all() [function.preg-match-all]: Compilation failed: nothing to repeat at offset 0 – EnexoOnoma Apr 03 '11 at 22:16
I'm not providing tutoring in the comments. – mario Apr 03 '11 at 22:30
Yes I can. As counterperformance I request that you go over your /past/ questions, and utilize the ▲ upvote arrow on questions where additional help was provided in comments. Come back when you reached your vote limit. – mario Apr 03 '11 at 22:37

Very simple editing in a regular expression pattern

2 Answers2