PHP regex linebreak

Question

Simple regex question on extracting comments with linebreaks:

String:

   <description language="de">Diese Tabelle zeigt die Zugangswege der Besucher auf die Website</description>

   <options>
      <!-- Hier stehen die Optionen für den View, die sich nicht auf colums beziehen.
           Bisher gibt es da nix, kann aber mal nicht schaden das vorzusehen  -->
   </options>


   <defaultcolumn>
      <!-- Hier können für Basiswerte für alle Spalten definiert werden. 
           Die Spaltendefinition weiter unten gibt die Möglichkeit die Werte je Spalte zu überschreiben
           Welche Optionen es gibt (incl. Titel, Description und Emptycelltext) siehe "allvaluescolumn" oben. 
      -->
      <options>
         <option name="align" value="left"><!-- (left|center|right), default left --></option>

My regex attempt:

/<!--(.*)-->/

This extracts all one-line comments.

Question:

How do I get all comments? Also the multiline ones? Adding \n or \r\n\ did not succeed.

score 2 · Answer 1 · answered Oct 11 '12 at 08:37

The correct way to do this, as is so often the case when dealing with an (X)HTML/XML string, is not to use regex at all, but instead to use DOM and XPath.

To get all comments in the document, the XPath query you want is:

//comment()

For example:

$str = '<description language="de">Diese Tabelle zeigt die Zugangswege der Besucher auf die Website</description>

<options>
  <!-- Hier stehen die Optionen für den View, die sich nicht auf colums beziehen.
       Bisher gibt es da nix, kann aber mal nicht schaden das vorzusehen  -->
</options>


<defaultcolumn>
  <!-- Hier können für Basiswerte für alle Spalten definiert werden. 
       Die Spaltendefinition weiter unten gibt die Möglichkeit die Werte je Spalte zu überschreiben
       Welche Optionen es gibt (incl. Titel, Description und Emptycelltext) siehe "allvaluescolumn" oben. 
  -->
  <options>
     <option name="align" value="left"><!-- (left|center|right), default left --></option>';

$doc = new DOMDocument('1.0');
@$doc->loadHTML($str);
$xpath = new DOMXPath($doc);

$nodes = $xpath->query('//comment()');

$comments = array();

foreach ($nodes as $node) {
    $comments[] = trim($node->nodeValue);
}

print_r($comments);

See it working

Thanks for this answer, however I wanted a regex solution. Ill use the xpath for future stuff. — Shlomo, Oct 11 '12 at 10:30
@Azincourt , I advice you to use this answer instead of using regex. Not only is regex extremely difficult when facing more complex problems but also XPath is an object - a tool - specifically designed for searching in these types of strings. — Sem, Oct 11 '12 at 11:03
Actually I am currently not using it in production rather a short code snippet. I just wanted to remove the comments in a single file and wanted to know why my regex was wrong. That is why I said I WILL use it in the future. — Shlomo, Oct 11 '12 at 14:25

score 1 · Accepted Answer · answered Oct 11 '12 at 08:31

1

Try

/<!--(.*?)-->/s

The . matches per default no newline characters, so you need to enable the dotall mode with the s after the regex delimiter. (s is the singleline modifier, it treats the whole string as one single line, i.e. make the dot match also newlines.)

Then I made the quantifier ungreedy by adding a ? after it , otherwise it will match from the first opening tag to the last closing tag.

answered Oct 11 '12 at 08:31

stema

90,351
20
107
135

@Azincourt Did you forget to? – Daedalus Oct 11 '12 at 08:47

PHP regex linebreak

2 Answers2