1

I have the following string:

$out = '
<li style="margin: 0px; padding: 0px; ">myspace&nbsp;<a href="http://www.google.com/search?hl=en&lr=&q=myspace" rel="gb_pageset[]" title="Results @ Google" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/g-tiny.jpg" alt="Results @ Google" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a>&nbsp;<a href="http://search.yahoo.com/search?p=myspace" rel="gb_pageset[]" title="Results @ Yahoo" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/y-tiny.jpg" alt="Results @ Yahoo" border="0" style="margin: 0px; padding: 0px; " width="16" height="16">&nbsp;</a><a href="http://search.msn.com/results.aspx?FORM=MSNH&srch_type=0&q=myspace" rel="gb_pageset[]" title="Results @ MSN" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/m-tiny.jpg" alt="Results @ MSN" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a></li>
<li style="margin: 0px; padding: 0px; ">google&nbsp;<a href="http://www.google.com/search?hl=en&lr=&q=google" rel="gb_pageset[]" title="Results @ Google" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/g-tiny.jpg" alt="Results @ Google" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a>&nbsp;<a href="http://search.yahoo.com/search?p=google" rel="gb_pageset[]" title="Results @ Yahoo" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/y-tiny.jpg" alt="Results @ Yahoo" border="0" style="margin: 0px; padding: 0px; " width="16" height="16">&nbsp;</a><a href="http://search.msn.com/results.aspx?FORM=MSNH&srch_type=0&q=google" rel="gb_pageset[]" title="Results @ MSN" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/m-tiny.jpg" alt="Results @ MSN" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a></li>
<li style="margin: 0px; padding: 0px; ">youtube&nbsp;<a href="http://www.google.com/search?hl=en&lr=&q=youtube" rel="gb_pageset[]" title="Results @ Google" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/g-tiny.jpg" alt="Results @ Google" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a>&nbsp;<a href="http://search.yahoo.com/search?p=youtube" rel="gb_pageset[]" title="Results @ Yahoo" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/y-tiny.jpg" alt="Results @ Yahoo" border="0" style="margin: 0px; padding: 0px; " width="16" height="16">&nbsp;</a><a href="http://search.msn.com/results.aspx?FORM=MSNH&srch_type=0&q=youtube" rel="gb_pageset[]" title="Results @ MSN" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/m-tiny.jpg" alt="Results @ MSN" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a></li>
<li style="margin: 0px; padding: 0px; ">ebay&nbsp;<a href="http://www.google.com/search?hl=en&lr=&q=ebay" rel="gb_pageset[]" title="Results @ Google" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/g-tiny.jpg" alt="Results @ Google" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a>&nbsp;<a href="http://search.yahoo.com/search?p=ebay" rel="gb_pageset[]" title="Results @ Yahoo" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/y-tiny.jpg" alt="Results @ Yahoo" border="0" style="margin: 0px; padding: 0px; " width="16" height="16">&nbsp;</a><a href="http://search.msn.com/results.aspx?FORM=MSNH&srch_type=0&q=ebay" rel="gb_pageset[]" title="Results @ MSN" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/m-tiny.jpg" alt="Results @ MSN" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a></li>
<li style="margin: 0px; padding: 0px; ">yahoo&nbsp;<a href="http://www.google.com/search?hl=en&lr=&q=yahoo" rel="gb_pageset[]" title="Results @ Google" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/g-tiny.jpg" alt="Results @ Google" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a>&nbsp;<a href="http://search.yahoo.com/search?p=yahoo" rel="gb_pageset[]" title="Results @ Yahoo" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/y-tiny.jpg" alt="Results @ Yahoo" border="0" style="margin: 0px; padding: 0px; " width="16" height="16">&nbsp;</a><a href="http://search.msn.com/results.aspx?FORM=MSNH&srch_type=0&q=yahoo" rel="gb_pageset[]" title="Results @ MSN" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/m-tiny.jpg" alt="Results @ MSN" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a></li>
<li style="margin: 0px; padding: 0px; ">craigslist&nbsp;<a href="http://www.google.com/search?hl=en&lr=&q=craigslist" rel="gb_pageset[]" title="Results @ Google" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/g-tiny.jpg" alt="Results @ Google" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a>&nbsp;<a href="http://search.yahoo.com/search?p=craigslist" rel="gb_pageset[]" title="Results @ Yahoo" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/y-tiny.jpg" alt="Results @ Yahoo" border="0" style="margin: 0px; padding: 0px; " width="16" height="16">&nbsp;</a><a href="http://search.msn.com/results.aspx?FORM=MSNH&srch_type=0&q=craigslist" rel="gb_pageset[]" title="Results @ MSN" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/m-tiny.jpg" alt="Results @ MSN" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a></li>
<li style="margin: 0px; padding: 0px; ">you tube&nbsp;<a href="http://www.google.com/search?hl=en&lr=&q=you%20tube" rel="gb_pageset[]" title="Results @ Google" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/g-tiny.jpg" alt="Results @ Google" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a>&nbsp;<a href="http://search.yahoo.com/search?p=you%20tube" rel="gb_pageset[]" title="Results @ Yahoo" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/y-tiny.jpg" alt="Results @ Yahoo" border="0" style="margin: 0px; padding: 0px; " width="16" height="16">&nbsp;</a><a href="http://search.msn.com/results.aspx?FORM=MSNH&srch_type=0&q=you%20tube" rel="gb_pageset[]" title="Results @ MSN" style="margin: 0px; padding: 0px; text-decoration: none; color: rgb(51, 102, 255); "><img src="http://www.howrank.com/images/m-tiny.jpg" alt="Results @ MSN" border="0" style="margin: 0px; padding: 0px; " width="16" height="16"></a></li>
';

and basically want to echo everything that shows after

<li style="margin: 0px; padding: 0px; ">

and before

&nbsp;<a href="http://www.goo

The result I would like to see is:

myspace google ebay yahoo craigslist you tube

I tried various things that I found on stackoverflow but for some reason it either only returned one word or I got a 500 server error, so maybe YOU know the right solution.

For example:

$startsAt = strpos($out, '<li style="margin: 0px; padding: 0px; ">') + strlen('<li style="margin: 0px; padding: 0px; ">');
$endsAt = strpos($out, '&nbsp;<a href="http://www.goo', $startsAt);
$result = substr($out, $startsAt, $endsAt - $startsAt);

echo $result;

I know, there is something missing like some foreach matching stuff, but since I am new to PHP, I am still stuck with this until I fully understand how this all works. I tried something like foreach($result as match) { echo $match; } and so on, but with no success. Must be something that I am missing.

Don't be too harsh with me, I am still very new to PHP (started about a week ago) but I'm willing to learn :)

Thank you for your time.

Marcus Weller
  • 361
  • 1
  • 3
  • 11
  • 2
    why are you not using DOM operations? far easier/more reliable than any substring/regex operations you might try. – Marc B Oct 12 '12 at 14:13
  • check [`strip_tags`](http://www.php.net/manual/en/function.strip-tags.php) – air4x Oct 12 '12 at 14:15
  • @air4x - why strip_tags if there are trivial regular expressions that can deal with this? – N.B. Oct 12 '12 at 14:20
  • @N.B. a solution involving regular expressions might have to be changed more frequently, for changes in the input string than when using strip_tags. – air4x Oct 12 '12 at 16:22
  • @air4x - what are you talking about? It makes absolutely no sense. – N.B. Oct 12 '12 at 18:17
  • @N.B. check this [stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – air4x Oct 13 '12 at 03:24
  • @air4x - did you even read what is on that link or are you just throwing links around? Please, just stop, you have no idea what you're talking about. – N.B. Oct 13 '12 at 13:09

3 Answers3

2
if (preg_match_all('/>([^<>]+?)&nbsp;/', $out, $matches)) {
    print_r($matches[1]);
}

This captures text between > and &nbsp;. The ([^<>]+?) part matches any character except angle brackets [^<>] repeated one or more times +, finding the shortest matches possible ?. The parentheses are used to capture these matching parts so we can access them via $matches[1] afterwards.

Output:

Array
(
    [0] => myspace
    [1] => google
    [2] => youtube
    [3] => ebay
    [4] => yahoo
    [5] => craigslist
    [6] => you tube
)
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
1

There are a few things you could do here... explode by line break (to get your <li>..</li> lines as an array or us a regular expression - which granted have a bit of a learning curve. You're idea will work (almost there) but it relies on things being exactly formatted a certain way - there's a few ways to avoid that an have the same result.

<?php    
$out=/*...*/
function findStart($string,$last=0) {
   $start=strpos($string,"<li",$last);
   if ($start===false) return -1;//No new start
   $start=strpos($string,">",$start);
   if ($start===false) return -1;//Mal formed <li>?
   return $start+1;//Don't include the >
}

$start=0;
while (0<$start=findStart($out,$start)) {
   $end=strpos($out,"&nbsp;<",$start);
   if ($end!==false) {
     $set[]=substr($out,$start,$end-$start);
     $start=$end;//Forward the pointer for the next loop
   } else {$start=-1;}
}

//Now $set is an array of the values
print_r($set);
?>
Rudu
  • 15,682
  • 4
  • 47
  • 63
0

Are you looking to parse HTML with PHP, so you get everything (even possible nested HTML elements) in the LI? The problem is you can have the same LI code within another LI, so it can be tricky to do with string functions.

Perhaps DOM functions built into PHP can help here...

Zathrus Writer
  • 4,311
  • 5
  • 27
  • 50