sed substitution: for similar regex pattern match, the substitution result is very different

Asked Dec 28 '17 at 08:25

Active Dec 28 '17 at 08:59

Viewed 38 times

I am using sed to replace all HTML tags in a file

Text:

<html>
   <body>
      <h1>Hello World!</h1>
   </body>
</html>

I have checked that basic regular expressions <.*\?> and <[^>]*> match only HTML tags in the text.

When I use sed 's/<.*\?>//g' [input-file], sed replaces everything and five blank lines are printed, whereas, sed 's/<[^>]*>//g [input-file] produces the correct output and first prints two blank line, then Hello World! with appropriate indentation on the next line and last two blank line.

Why does it behave differently for similar matches?

edited Dec 28 '17 at 08:31

asked Dec 28 '17 at 08:25

HarshvardhanSharma

3

Do not use `sed` for HTML text parsing, use syntax aware parsers – Inian Dec 28 '17 at 08:27
1

sed doesn't support non-greedy.. see https://www.gnu.org/software/sed/manual/sed.html#Regular-Expressions-Overview and https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y – Sundeep Dec 28 '17 at 09:05

sed substitution: for similar regex pattern match, the substitution result is very different

0 Answers0