preg_match( '/<title>(.*)<\/title>/',.....)
preg_match("/src=[\"']?([^\"']?.*(png|jpg|gif))[\"']?/i",....)
Asked
Active
Viewed 212 times
0

Felix Kling
- 795,719
- 175
- 1,089
- 1,143

runeveryday
- 2,751
- 4
- 30
- 44
-
Looks like they would extract information from a HTML page. The title and the addresses of images. – Felix Kling Feb 10 '11 at 08:43
2 Answers
6
The first is to extract the contents from a HTML title
tag.
The second is to extract images' src
attributes from a HTML document, but is very imperfect (It won't catch references to image resources that end in .jpeg
or have no extension at all).
Regular expressions are not a good idea for parsing HTML! One should use a HTML parser instead. They are far from fireproof.
-
@Pekka Yes, always tell'em [to not do that](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html). +1 – Linus Kleen Feb 10 '11 at 08:45
-
Note that the first regex will fail if the line (or, in multi-line mode, the _entire document_) has multiple `
` elements. That may be unlikely for this specific case but in general produces very bad results. – Chris Lutz Feb 10 '11 at 08:48 -
1why the edit? `The regexes will probably both do a half-way decent job - if part of an existing project, you can probably leave them be. But they are far from fireproof, and if you're building stuff from scratch, don't use this approach.` Most people will continue to use bad code but the should be encouraged to fix it instead. – beggs Feb 10 '11 at 08:49
-
Also I thinknthe second regex is terrible too for the same reason. It's very lazy about validating what can and can't be in a string and may grab too much unless I'm badly mistaken. – Chris Lutz Feb 10 '11 at 08:51
-
@beggs I'd say it depends on the situation. If it's a newbie finding his way through production code, it won't be their first priority. In general however, you're right, edited that out. @Chris good points! – Pekka Feb 10 '11 at 08:51
-
-
1@runeveryday they are patterns and delimiters. `(.*)<\/title>` means "grab everything up to the next occurrence of `` and return it as part of the result. The `/` is used as a delimiter around the expression. There's some more info here http://www.regular-expressions.info/php.html – Pekka Feb 10 '11 at 09:12
-
thank you,i know, but to the second line code. why there is no / delimiter after ]?/i – runeveryday Feb 10 '11 at 09:17
-
@runeveryday `i` is a flag that comes after the delimiter, specifying case insensitive search (in order to also catch `JPG`, `GIF` ....) – Pekka Feb 10 '11 at 09:17
0
1) Matches anything between <title>
and </title>
a la an HTML page's title, so run against <title>foo</title>
results in the match being foo
.
2) Matches any string following src=
that ends in png
, jpg
or gif
. Used to extract the URL of images in HTML code.
Per @Pekka's answer: don't do this in real world code.

beggs
- 4,185
- 2
- 30
- 30