0

How do i remove the text between two patterns in a line of a file, i have list of lines here i show only two lines for simpliciy

 <sup id="Gen.2.23" class="v0_2_23">23</sup>Anke Adam pulo:</span></p><p class="q2"><span class="v0_2_23">“La ke non nerrepi-heihei pen arrepi-lo lapen ne-ok pen a-ok-lo;</span></p><p class="q2"><span class="v0_2_23">bangpi aphan ‘Arloso’ pusi hangpo,</span></p><p class="q2"><span class="v0_2_23">pima bangpi ke Pinso pensi enlo.”</span></p>
 <sup id="Gen.2.24" class="v0_2_24">24</sup>Anke Adam pulo:</span></p><p class="q2"><span class="v0_2_24">“La ke non nerrepi-heihei pen arrepi-lo lapen ne-ok pen a-ok-lo;</span></p><p class="q2"><span class="v0_2_24">bangpi aphan ‘Arloso’ pusi hangpo,</span></p><p class="q2"><span class="v0_2_24">pima bangpi ke Pinso pensi enlo.”</span></p>

i want to remove the text between </span></p><p class="q2"> to ">

The result which i need in output is shown below

 <sup id="Gen.2.23" class="v0_2_23">23</sup>Anke Adam pulo: “La ke non nerrepi-heihei pen arrepi-lo lapen ne-ok pen a-ok-lo;bangpi aphan ‘Arloso’ pusi hangpo, pima bangpi ke Pinso pensi enlo.”</span></p>
 <sup id="Gen.2.24" class="v0_2_24">24</sup>Anke Adam pulo: “La ke non nerrepi-heihei pen arrepi-lo lapen ne-ok pen a-ok-lo;bangpi aphan ‘Arloso’ pusi hangpo, pima bangpi ke Pinso pensi enlo.”</span></p>

When i used sed 's/<\/span><\/p><p class="q2">*.*">//g' it removes the first <span and last ">

Rorschach
  • 31,301
  • 5
  • 78
  • 129
Biki Teron
  • 237
  • 2
  • 4
  • 12

2 Answers2

1

It looks like you are looking for a non-greedy match, otherwise the .*>" will match as much possible on the line. The syntax for non-greedy matching is generally *?, although I don' believe it is supported by sed. So, for your case you could do something like,

perl -pe 's;</span></p><p class="q2">.*?">;;g' input.html

But, as @melpomene suggests, regexps aren't a good choice for HTML parsing.

Rorschach
  • 31,301
  • 5
  • 78
  • 129
0

It looks like this yields what you want:

sed 's/<\/span><\/p><p class="q2"><span class="v0_2_23">//g' file

To avoid escaping you can use a different separator like:

 sed 's|</span></p><p class="q2"><span class="v0_2_23">||g' file
goose goose
  • 86
  • 3
  • 15