You have a few problems to deal with here.
PHP does not support the g
(global) modifier and the m
(multi-line) modifier causes ^
and $
to match the begin/end of each line. You can remove these, we don't need them.
You need to account for whitespace between the th
and td
elements.
You are repeating the capturing group (.)*
so only the last iteration will be captured, in this case the letter s
in Paris would be captured instead of the entire contents of that td
element.
For this simple case, the following would be enough:
~<th>City :</th>\s*<td>(.*?)</td>~i
Note: The *
operator follows the dot .
saying match any character except newline "zero or more" times. When supplying the question mark after the operator *?
you're telling the engine to return a non-greedy match.
However, for parsing HTML in the near future I would recommend using a tool such as DOM
.
$dom = DOMDocument::loadHTML('
<tr>
<th>postal code :</th>
<td>75012</td>
</tr>
<tr>
<th>City :</th>
<td>Paris</td>
</tr>
');
$xp = new DOMXPath($dom);
$td = $xp->query('//th[contains(.,"City")]/following-sibling::*[1]');
echo $td->item(0)->nodeValue; //=> "Paris"