-1

I want to extract lines from file between <div class="AA"> and <div class="clear"></div>.

regex with sed and grep are welcome as well.

Update

Here is part of my huge XML file:

RUBBISH
RUBBISH
.
.
.
    <div class="span9">
          <div class="results-count">AAA</div>
    <div class="AA">
      <div class="A"><a href="/TEST">BBB</a>
      </div>
      <div class="BB"><span>CCC</span><br/><a href="/TEST1" class="B">DDD</a>
        <div></div><span>EEE</span><br/><img src="TEST2" title="C"/><a href="/TEST3" class="D">FFF</a>,
    <a href="/TEST4" class="E">GGG</a>
        <div class="clear"></div><a href="/TEST5" class="details">Details</a>
      </div>
      <pre>HHH</pre>
      <div class="clear"></div>
    .
    .
    .
    <div class="span9">
          <div class="results-count">AAA</div>
    <div class="AA">
      <div class="A"><a href="/TEST">BBB</a>
      </div>
      <div class="BB"><span>CCC</span><br/><a href="/TEST1" class="B">DDD</a>
        <div></div><span>EEE</span><br/><img src="TEST2" title="C"/><a href="/TEST3" class="D">FFF</a>,
    <a href="/TEST4" class="E">GGG</a>
        <div class="clear"></div><a href="/TEST5" class="details">Details</a>
      </div>
      <pre>HHH</pre>
      <div class="clear"></div>


RUBBISH
RUBBISH


    <div class="span9">
          <div class="results-count">AAA</div>
    <div class="AA">
      <div class="A"><a href="/TEST">BBB</a>
      </div>
      <div class="BB"><span>CCC</span><br/><a href="/TEST1" class="B">DDD</a>
        <div></div><span>EEE</span><br/><img src="TEST2" title="C"/><a href="/TEST3" class="D">FFF</a>,
    <a href="/TEST4" class="E">GGG</a>
        <div class="clear"></div><a href="/TEST5" class="details">Details</a>
      </div>
      <pre>HHH</pre>
      <div class="clear"></div>
    .
    .
    .
Community
  • 1
  • 1
MLSC
  • 5,872
  • 8
  • 55
  • 89

2 Answers2

2
awk '/<div class="clear"><\/div>/{p=0} p{print} /<div class="results-count">/{p=1}'
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • Thank you... I have multiple of this pattern in my huge xml file... What should I do for that? – MLSC Jan 21 '15 at 05:32
  • What you should do is describe your problem in more detail. My code will work for multiple blocks, but it may not do what you want, so... what do you want that this code is not already doing? – Amadan Jan 21 '15 at 05:33
  • Please check the update... I have many of this block and also some extra xml tags. But I want just this block – MLSC Jan 21 '15 at 05:37
  • And again, having looked at the sample data, I still fail to see what you need that the snippet in my answer does not address. Please describe the difference between your expectations and the output of my code. You keep saying you have "multiple of this pattern in my huge xml file"; my code handles multiple occurrences. – Amadan Jan 21 '15 at 05:40
1

Through grep,

$ grep -ozP '(?s)(?:\n|^)\s*<div class="results-count">[^\n]*\n\K.*?(?=\n\s*<div class="clear"></div>)' file
<div class="AA">
  <div class="A"><a href="/TEST">BBB</a>
  </div>
  <div class="BB"><span>CCC</span><br/><a href="/TEST1" class="B">DDD</a>
    <div></div><span>EEE</span><br/><img src="TEST2" title="C"/><a href="/TEST3" class="D">FFF</a>,
<a href="/TEST4" class="E">GGG</a>

ReGex DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274