0

Why does applying the perl code

 undef $/;  # read in entire file or STDIN
 $_ = <>;
 s|<head>.*<\head>|<head>...</head>|s;

applied to a text file containing

 <head>[anything]</head>

produce

 ...

and not

 <head>...</head>

?

When the '<' characters in the substitution REPLACE field are omitted, as in

 s|<head>.*</head>|head>.../head>|s;

the substitution produces

 head>...end>

The '<' character makes the difference, but I can find no explanation of why.

How does one produce a '<' in the substitution result?

JPF
  • 29
  • 3
  • Hi, and welcome to Stack Overflow. Your original code has a bug in it, `s|.*<\head>|`. Note the `\h` instead of `/h`. `\h` is matches "horizontal whitespace". Is this your real code? Or did you make a mistake pasting it in? Otherwise `s|.*|...|s;` is fine, though [you should not use regexes to parse HTML](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Schwern Aug 31 '18 at 22:51
  • 1
    Assuming `<\head>` is a mistake, your code does what you expect. Whatever you are using to view the result is probably the cause of your missing tags. Are you looking at the output in a browser? – Borodin Aug 31 '18 at 22:54

2 Answers2

1

The first snippet does not produce the output you claim it does.

$ perl -e'$_ = "<head>foo</head>"; s|<head>.*<\head>|<head>...</head>|s; CORE::say'
<head>foo</head>

The reason it doesn't perform a replacement is because \h matches a horizontal whitespace character.

You probably meant to use </head> instead of <\head>. That produces the desired output.

$ perl -e'$_ = "<head>foo</head>"; s|<head>.*</head>|<head>...</head>|s; CORE::say'
<head>...</head>

Nothing even similar to your code produces just ... as you claim. Of course, if you view a file containing <head>...</head> in an HTML viewer, it would appear as .... To produces HTML that renders as <head>...</head>, you'll need to perform some escaping.

$ perl -e'
   use HTML::Escape qw( escape_html );
   $_ = "<head>foo</head>";
   s|<head>.*</head>|<head>...</head>|s;
   CORE::say(escape_html($_));
'
&lt;head&gt;...&lt;/head&gt;
ikegami
  • 367,544
  • 15
  • 269
  • 518
0

Assuming <\head> is a mistake, your code does what you expect. Whatever you are using to view the result is probably the cause of your missing tags. Are you looking at the output in a browser?

When you remove the opening <, the tags no longer look like tags and they are displayed instead of being actioned.

Borodin
  • 126,100
  • 9
  • 70
  • 144