0

I am trying to do a preg_match on a menu. But the php seems to keep skipping what should be the first match. I can't find the reason.

<div id="subnav">
<div class="wrap">
<ul id="menu-patternstutorials" class="nav superfish">
  <li id="menu-item-11512" class="menu-item menu-item-type-custom menu-item-object-custom current-menu-item current_page_item menu-item-home current-menu-ancestor menu-item-11512">
    <a href="http://localhost/Sites/craftpassion/">Patterns Tutorials</a> 
    <ul class="sub-menu"> 
  <li id="menu-item-11506" class="menu-item menu-item-type-taxonomy menu-item-object-category current-menu-ancestor current-menu-parent menu-item-11506 star-li-bg">
      <a title=" TEST" href="http://localhost/Sites/craftpassion/category/needle-craft/sewing">Sewing</a> 
  <ul class="sub-menu"> 
    <li id="menu-item-11508" class="menu-item menu-item-type-custom menu-item-object-custom current-menu-item current_page_item menu-item-home current-menu-ancestor current-menu-parent menu-item-11508">
        <a href="http://localhost/Sites/craftpassion/">Basic Techniques</a> 
    <ul class="sub-menu"> 

The PHP

    $pattern = '#<ul[^`]*?>[\s]*?<li [^`]*?>[\s]*?<a[^`]*?>([^`]*?)</a>[\s]*?<ul[^`]*?>#i';
    preg_match($pattern, $menu, $matches);

I was expecting:

<ul id="menu-patternstutorials" class="nav superfish">
  <li id="menu-item-11512" class="menu-item menu-item-type-custom menu-item-object-custom current-menu-item current_page_item menu-item-home current-menu-ancestor menu-item-11512">
    <a href="http://localhost/Sites/craftpassion/">Patterns Tutorials</a> 
    <ul class="sub-menu">

But keep getting:

<ul class="sub-menu"> 
  <li id="menu-item-11506" class="menu-item menu-item-type-taxonomy menu-item-object-category current-menu-ancestor current-menu-parent menu-item-11506 star-li-bg">
      <a title=" TEST" href="http://localhost/Sites/craftpassion/category/needle-craft/sewing">Sewing</a> 
  <ul class="sub-menu">

Why is it not matching the first expected?

Rob
  • 1
  • 2
    Where is the HTML coming from? Any reason why you don't use a DOM parser? – Felix Kling May 24 '11 at 08:39
  • 4
    see: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Denis de Bernardy May 24 '11 at 08:41
  • to test your hypothesis without thinking very much (i'm not gonna parse that thing by hand): copy and paste the first "should-be match" and see if they're still both skipped, or if only the first is. –  May 24 '11 at 08:41
  • @Denis: I _immediately_ searched for this when i saw the pattern! :D – jwueller May 24 '11 at 08:42
  • 2
    Congrats. You wrote a regex that you cannot yourself debug. You might be using the wrong tool. I know I've been using regexen to do hacky stuff like this, but I never thought about bothering other people with that. This question will never become useful to others (beyond the level of: "steer cleer, down that road lies trouble"). Meanwhile [how to debug a regex](http://stackoverflow.com/questions/2348694/how-do-you-debug-a-regex) – sehe May 24 '11 at 08:46
  • In RegexBuddy, your regex *is* matching both substrings. Have you tried using `preg_match_all()` and see if the results are different? – Tim Pietzcker May 24 '11 at 10:37

2 Answers2

0

HTML != regular language. See:

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
0

Do not parse (X)HTML using regular expressions. It is not possible to do that (properly), since you are not dealing with a regular language (which is what regular expressions are able to handle). Use a DOM or SAX parser instead.

jwueller
  • 30,582
  • 4
  • 66
  • 70
  • So maybe you can point me into a better direction. Here is what I am wanting to get done. This menu is already inside of a php variable. I am wanting to find the list-items that have another unordered-list inside them. Then use the text inside the first list-items link to make a new list-item at the beginning of the next unordered-list that will be a "title" Final Structure looks like: – Rob May 24 '11 at 18:46