1

I want to replace multiple lines starting from the next line of the occurrence by one line.

For example, I have the following section of an html file:

...
<div class="myclass">
  <p textselect="true">
     one
     two
     three
  </p>
</div>
...

I want to end up with

...
<div class="myclass">
  <p textselect="true">
     hello
  </p>
</div>
...

...but I need to be able to match the string with <p textselect="true">, rather then with one, as I don't know what that string (one) is going to be.

Right now my solution is pretty nasty. I am appending a placeholder after <div class="myclass">, I delete the placeholder and the next 3 lines, then I append again the string I want. All with sed. I need it to be either with sed or awk.

suren
  • 7,817
  • 1
  • 30
  • 51
  • 2
    Obligatory [use an HTML/XML parser to parse HTML/XML](https://stackoverflow.com/a/1732454/7552) link. Why do you _need_ awk or sed to do this job? For example, if your HTML is well-formed XML, then `xmlstarlet ed -O -u '//div[@class="myclass"]/p[@textselect="true"]' -v 'hello' file.html` would do it – glenn jackman Mar 10 '22 at 15:48
  • @glennjackman because it goes in a `Dockerfile`, with an alpine base image, where I do have sed and awk, but I dont have python, perl or xmlstarlet. And I am trying to keep the image as light as possible without having to add extra layers deleting packages. – suren Mar 10 '22 at 15:59

2 Answers2

1

This might work for you (GNU sed):

sed -E '/<p textselect="true">/{:a;N;/<\/p>/!ba;s/(\n\s*).*\n/\1hello\n/}' file

Gather up lines between <p textselect="true"> and </p> and replace the string inbetween with hello.

Alternative:

sed -n '/<p textselect="true">/{p;n;s/\S\+/hello/;h;:a;n;/<\/p>/!ba;H;g};p' file

N.B. Both solutions expect at least one line to be replaced.

potong
  • 55,640
  • 6
  • 51
  • 83
1

Using any awk in any shell on every Unix box:

$ awk '/<\/p>/{f=0} !f; sub(/<p textselect="true">/,"  hello"){print; f=1}' file
...
<div class="myclass">
  <p textselect="true">
    hello
  </p>
</div>
...

or if your replacement string hello might contain & (a backreference metacharacter for *sub()) then:

$ awk '/<\/p>/{f=0} !f; sub(/<p textselect="true">/,"  "){print $0 "hello"; f=1}' file
...
<div class="myclass">
  <p textselect="true">
    hello
  </p>
</div>
...
Ed Morton
  • 188,023
  • 17
  • 78
  • 185