0
  1. preg_match( '/<title>(.*)<\/title>/',.....)

  2. preg_match("/src=[\"']?([^\"']?.*(png|jpg|gif))[\"']?/i",....)

Felix Kling
  • 795,719
  • 175
  • 1,089
  • 1,143
runeveryday
  • 2,751
  • 4
  • 30
  • 44

2 Answers2

6

The first is to extract the contents from a HTML title tag.

The second is to extract images' src attributes from a HTML document, but is very imperfect (It won't catch references to image resources that end in .jpeg or have no extension at all).

Regular expressions are not a good idea for parsing HTML! One should use a HTML parser instead. They are far from fireproof.

Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • @Pekka Yes, always tell'em [to not do that](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html). +1 – Linus Kleen Feb 10 '11 at 08:45
  • Note that the first regex will fail if the line (or, in multi-line mode, the _entire document_) has multiple `` elements. That may be unlikely for this specific case but in general produces very bad results. – Chris Lutz Feb 10 '11 at 08:48
  • 1
    why the edit? `The regexes will probably both do a half-way decent job - if part of an existing project, you can probably leave them be. But they are far from fireproof, and if you're building stuff from scratch, don't use this approach.` Most people will continue to use bad code but the should be encouraged to fix it instead. – beggs Feb 10 '11 at 08:49
  • Also I thinknthe second regex is terrible too for the same reason. It's very lazy about validating what can and can't be in a string and may grab too much unless I'm badly mistaken. – Chris Lutz Feb 10 '11 at 08:51
  • @beggs I'd say it depends on the situation. If it's a newbie finding his way through production code, it won't be their first priority. In general however, you're right, edited that out. @Chris good points! – Pekka Feb 10 '11 at 08:51
  • i want to know what are these signals(/ (.*) \) meaning. – runeveryday Feb 10 '11 at 09:10
  • 1
    @runeveryday they are patterns and delimiters. `(.*)<\/title>` means "grab everything up to the next occurrence of `` and return it as part of the result. The `/` is used as a delimiter around the expression. There's some more info here http://www.regular-expressions.info/php.html – Pekka Feb 10 '11 at 09:12
  • thank you,i know, but to the second line code. why there is no / delimiter after ]?/i – runeveryday Feb 10 '11 at 09:17
  • @runeveryday `i` is a flag that comes after the delimiter, specifying case insensitive search (in order to also catch `JPG`, `GIF` ....) – Pekka Feb 10 '11 at 09:17
0

1) Matches anything between <title> and </title> a la an HTML page's title, so run against <title>foo</title> results in the match being foo.

2) Matches any string following src= that ends in png, jpg or gif. Used to extract the URL of images in HTML code.

Per @Pekka's answer: don't do this in real world code.

beggs
  • 4,185
  • 2
  • 30
  • 30