3

I need regular expression to replace < , > and & with &lt; , &gt; and &amp; but this characters I wanna replace just in <body></body> tags

example

Before replacing

<head> <><><>& </head> <body><><>&</body>

after replacing

<head> <><><>& </head> <body>&lt;&gt;&lt;&gt;&amp;</body>

thank you :)

php12345
  • 133
  • 1
  • 2
  • 5

3 Answers3

0

I think that what you really need is:

  1. An xml parser to parse your string and get the <body> section. See this question for more information;
  2. htmlspecialchars() for the result.

Edit: If you know exactly what the html looks like, you can of course also explode on </head> or <body> to split your input in two, but that would be highly dependant on the exact format of the input so I would not recommend it.

Community
  • 1
  • 1
jeroen
  • 91,079
  • 21
  • 114
  • 132
  • I need this because i have large xml file with this <, > and & in and simplexml_load_file() can't read file now i load file with file get content then I want to replace with preg_replace – php12345 Jun 22 '13 at 13:55
  • @php12345 Perhaps something like XMLReader would work, see the question I linked to. – jeroen Jun 22 '13 at 13:57
0

Description

To do this with a regex ti'll need to be done in a couple of steps:

  1. Capture the body's inner string:

    regex: (^.*?<body>)(.*)(<\/body>)$

    enter image description here

    Matches:

    [0] => <head> <><><>& </head> <body><><>&</body>
    [1] => <head> <><><>& </head> <body>
    [2] => <><>&
    [3] => </body>
    
  2. Replace each type of character separately inside matches[2]

  3. Reconstruct the string

PHP code example

$sourcestring="<head> <><><>& </head> <body><><>&</body>";
preg_match('/(^.*?<body>)(.*)(<\/body>)$/ims',$sourcestring,$matches);

$header=$Matches[1];
$body=$matches[2];
$footer=$Matches[3];

$body = preg_replace('/</ims','&lt;',$body);
$body = preg_replace('/>/ims','&gt;',$body);
$body = preg_replace('/&/ims','&amp;',$body);

$output = $header . $body . $footer;
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
0

I've done it with some trick. Firstly I find text between body tags then I change it to html special chars and save it. After that I replace text between body tags with [TO_BE_REPLACED] and in the end I change the text to be replaced with text escaped by htmlspecialchars()

<?php
$str = '<head> <><><>& </head> <body><><>&</body>';
preg_match('/<body>(.*?)<\/body>/', $str, $match);
$special = htmlspecialchars($match[1]); // you can use html entities as well
$str = preg_replace('/<body>(.*?)<\/body>/','<body>[TO_BE_REPLACED]</body>',$str);
echo htmlspecialchars(str_replace('[TO_BE_REPLACED]', $special, $str)); //this one is only to show purpose
echo '<br>----<br>';
echo str_replace('[TO_BE_REPLACED]', $special, $str);
?>

Check demo

Robert
  • 19,800
  • 5
  • 55
  • 85
  • This also changes double and single quotes to their respective character, which wasn't included in the OP. – Ro Yo Mi Jun 23 '13 at 04:22
  • @Denomales he can use array with str_replace if htmlspecialchars() is not need. He didn't mention that **ONLY** these characters are to be replaced. – Robert Jun 23 '13 at 20:53
  • True, however he was very explicit with which characters needed to be replaced. – Ro Yo Mi Jun 23 '13 at 20:59
  • @Denomales which looks exactly like html encoding... Your answer by the way uses 3 times preg_replace where you can use it one time and pass 2 arrays one as array of patterns and second array of replacements but the most simple way is to use str_replace also with arrays as arguments. What's sense of writing regex for "<".. I don't see any. – Robert Jun 24 '13 at 06:28