3

I'm sure this is an easy one, and as much as I've googled and searched here on SO - I can't seem to figure out what is wrong with this. I have other areas on this page where I use similar expressions that return exactly what I want.

However, I can't get this particular bit to return what I want, so maybe someone can help me.

I have a div with a specific ID "user-sub-commhome" - I want to pull out the text from within that div. The text is surrounded by tags but I can easily use strip_tags to get those gone. I'm using regex to try and pull the data out.

Here is my code:

$intro = "<div id="user-sub-summary">Summary</div>
<div id="user-sub-commhome"><em>Commercial</em></div>
<div id="whatever">whatever</div>";

$regex = '#\<div id="user-sub-commhome"\>(.+?)\<\/div\>#s';
preg_match($regex, $intro, $matches);
$match = $matches[0];
echo $match;

I've tried changing things with no success, nothing seems to work to echo anything. So I'm hoping some power that be who is much more experienced with regex can help.

Hanny
  • 2,078
  • 6
  • 24
  • 52
  • Not sure if this is just sample code, but your $intro variable is not correct, since it's not properly being escaped. – Devator Aug 16 '11 at 14:35
  • 1
    I would suggest that you try using a html parser instead of regex for this task. See http://stackoverflow.com/q/1732348/159388. – murgatroid99 Aug 16 '11 at 14:37
  • Yes, this is just sample code. $intro is actually a big chunk of html - I was just giving those as an example so people could see what I was talking about a bit more clear. – Hanny Aug 16 '11 at 14:38
  • Your pattern works fine with the HTML not taking in account syntax errors in PHP noticed by Devator. Escape double quotes. – Hnatt Aug 16 '11 at 14:39

3 Answers3

4

Your code works for me if you change the enclosing doublequotes around $intro to single quotes:

$intro = '<div id="user-sub-summary">Summary</div>
<div id="user-sub-commhome"><em>Commercial</em></div>
<div id="whatever">whatever</div>';

$regex = '#\<div id="user-sub-commhome"\>(.+?)\<\/div\>#s';
preg_match($regex, $intro, $matches);
$match = $matches[0];
echo $match;

You might want to read some famous advice on regular expressions and HTML.

Community
  • 1
  • 1
Otterfan
  • 751
  • 4
  • 7
1

i won't explain why using regular expressions to parse php is a bad idea. i think the problem here is you don't have error_reporting activated or you're simply not looking into your error-log. defining the $intro-string the way you do should cause a lot of problems (unexpectet whatever / unterminatet string). it should look like this:

$intro = "<div id=\"user-sub-summary\">Summary</div>
<div id=\"user-sub-commhome\"><em>Commercial</em></div>
<div id=\"whatever\">whatever</div>";

or this:

$intro = '<div id="user-sub-summary">Summary</div>
<div id="user-sub-commhome"><em>Commercial</em></div>
<div id="whatever">whatever</div>';

if you're using double quotes inside a double-quotet string, you have to mask them using a backslash (\). anoter way would be to use single-quotes for the string (like in my second example).

Community
  • 1
  • 1
oezi
  • 51,017
  • 10
  • 98
  • 115
  • Thanks - I should've clarified in the original one that the block of HTML was just for example purposes... My code actually looks like this: $intro = $form->data['introtext']; That just brings in a big block of HTML - I have other regex expressions that are written virtually the same way (except they say "user-sub-summary" or whatever ID I'm trying to grab) and they all work. Was just trying to see if anyone could find fault with this one... – Hanny Aug 16 '11 at 14:53
0

In your sample code $matches[0] contains all matched part, not the capturing group. The capturing group is in $matches[1]

Hnatt
  • 5,767
  • 3
  • 32
  • 43