-1

I'm using PHP preg_match function...

  1. How can i fetch text in between tags. The following attempt doesn't fetch the value: preg_match("/^<title>(.*)<\/title>$/", $originalHTMLBlock, $textFound);

  2. How can i find the first occurrence of the following element and fetch (Bunch of Texts and Tags):

    <div id="post_message_">Bunch of Texts and Tags</div>

user311509
  • 2,856
  • 12
  • 52
  • 69
  • you can use https://gist.github.com/1358174 and then for 1) `//title` and for 2) `//div[id="post_message"]`. also see http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662 – Gordon Jan 16 '12 at 10:58

2 Answers2

3

This is starting to get boring. Regex is likely not the tool of choice for matching languages like HTML, and there are thousands of similar questions on this site to prove it. I'm not going to link to the answer everyone else always links to - do a little search and see for yourself.

That said, your first regex assumes that the <title> tag is the entire input. I suspect that that's not the case. So

preg_match("#<title>(.*?)</title>#", $originalHTMLBlock, $textFound);

has a bit more of a chance of working. Note the lazy quantifier which becomes important if there is more than one <title> tag in your input. Which might be unlikely for <title> but not for <div>.

For your second question, you only have a working chance with regex if you don't have any nested <div> tags inside the one you're looking for. If that's the case, then

preg_match("#<div id=\"post_message_\">(.*?)</div>#", $originalHTMLBlock, $textFound);

might work.

But all in all, you'd better be using an HTML parser.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
0
  1. use this: <title\b[^>]*>(.*?)</title> (are you sure you need ^ and $ ?)
  2. you can use the same regex expression <div\b[^>]*>(.*?)</div> assuming you don't have a </div> tag in your Bunch of Texts and Tags text. If you do, maybe you should take a look at http://code.google.com/p/phpquery/
Vlad Jula-Nedelcu
  • 1,681
  • 11
  • 17