0

Another regex problem. This is my HTML input -

Some text<br />

        This is really important text <br />

  This is another important text what I need.<br />

Please, help me to get the important text from this code. And suggest me some good stuff about reg. expressions, because I feel pretty bad about asking you again.

SENorth
  • 3
  • 1
  • 3
    Your problem is that you are trying to use a regular expression for a language that is not regular! http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – johnsyweb Aug 11 '11 at 22:17
  • What marks the important text? Is it everything after the first `
    `?
    – Jonathan M Aug 11 '11 at 22:17
  • 2
    The question doesn't state what exactly you're wanting to get here. Are you wanting to check if a certain string is in the html? Are you wanting to get the contents of a certain html tag? With either of those questions, cwallenpoole is right, dont use regex. – Matt R. Wilson Aug 11 '11 at 22:19
  • I want to get all the content after first
    – SENorth Aug 11 '11 at 22:21

2 Answers2

2

Start with DOMDocument::loadHTML. Then take a look at the options available to you here (Document Object Model).

Vivin Paliath
  • 94,126
  • 40
  • 223
  • 295
0

You have PHP tagged so are you using the PHP regex function to do the parsing? If so, I can help you write a reg. expression to match between the
tags. The first match is your first "important text" and the second match is your second.

Try something likes this:

$result = preg_match("/<br \/>(.*)<br \/>/", $input, $_matches);
user783437
  • 227
  • 1
  • 9
  • Four problems here: No match at all without dotall modifier. When fixed this, all is matched to the end, because of greedy `.*`. If fixed this, then no second match with `preg_match` (maybe you ment `preg_match_all`?). But it wouldn't match the second "important text" either, because the needed leading `
    ` has been eaten by the first match.
    – stema Aug 12 '11 at 05:37