0

I know it is not recommended to parse XML / HTML with a reg-ex, but i am trying to do this simple thing:

<?php
echo phpversion()."<br><br>";

$test_1 = '<Tag attr="attr_value">Tag_value</Tag>';
$test_2 = $test_1.str_repeat(' ',1000);
$test_3 = $test_1.str_repeat(' ',2000);

$match = '!<(.*?) (.*?)="(.*?)">!';
$replace = '<\\2>\\3</\\2><\\1>';

$output_1 = preg_replace($match, $replace, $test_1);
$output_2 = preg_replace($match, $replace, $test_2);
$output_3 = preg_replace($match, $replace, $test_3);

echo "xml: ".htmlspecialchars($test_1)."<br>";
echo "1: ".htmlspecialchars($output_1)."<br>";
echo "2: ".htmlspecialchars($output_2)."<br>";
echo "3: ".htmlspecialchars($output_3)."<br>";
?>

I mean, putting an attribute and its value out of the container tag. All working fine with test_1 and test_2 examples, but if I add more spaces like in test_3, the return string is empty. Can someone try this code?

In this example it works adding 1411 spaces. One more (1412) and doesn't ...

I have tested on 5.3.8 and 5.3.19 PHP versions.

Thanks.

Community
  • 1
  • 1
ilvi
  • 3
  • 2

2 Answers2

1

Use this regex and it will work correctly:

$match = '!<([^ ]+) ([^=]+)="(.*?)">!';
kittycat
  • 14,983
  • 9
  • 55
  • 80
  • Also this is 300x faster than my original match, measured with microtime and a loop. Nice. Even 10% / 15% faster than Mikhails solution. – ilvi Feb 15 '13 at 13:21
0

Works fine for me on PHP 4.4.8 from command line. You expression seems to be very inefficient. Probably it causes some kind of error, e.g. out of memory, and thus preg_replace returns NULL which means "error". Here is optimized version of your expression:

<(\S*?) (\S*?)="([^"]*?)">
Mikhail Vladimirov
  • 13,572
  • 1
  • 38
  • 40
  • Working if I change $replace to: $replace = '<\\2>\\3\\2>\\1'; Thanks – ilvi Feb 15 '13 at 13:00
  • Finally i choose this solution, it works with big strings containing lots of tags / attributtes. Thank you again. – ilvi Feb 15 '13 at 14:38