0

I have a HTML file like this:

<html><head>
<title>My Title</title>
</head>
<body>
Title of this page: PAGE_TITLE
</body>
</html>

How can replace PAGE_TITLE on title?

I try this command:

sed -i 's/\(.*?<title>\)\(.*?\)\(<\/title>.*?\)PAGE_TITLE/\1\2\3\2/' page.html

but it doesn't work.

jub0bs
  • 60,866
  • 25
  • 183
  • 186
Geograph
  • 2,274
  • 23
  • 23

3 Answers3

3

Don't use regex to parse HTML. Using a proper parser & :

# fetch title string
title=$(xml sel -t -v /html/head/title file.html)
# edit file in-place
xml ed -L -u '/html/body/text()' -v "Title of this page: $title" file.html

xml is

Check: RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
1

Using awk:

awk '/<title>/ { title = $0; sub(".*<title>", "", title); sub("</title>.*", "", title)}
     /PAGE_TITLE/ { sub("PAGE_TITLE", title); }
     1' filename > filename.new
Barmar
  • 741,623
  • 53
  • 500
  • 612
0

The problem with your sed script is that you are using *? which is an unsupported regex extension. You can get much the same result with [^<>]* instead of .*?.

Also, the <title> element is not allowed inside the HTML <body> so you should not include it; you are creating invalid HTML.

tripleee
  • 175,061
  • 34
  • 275
  • 318