I have this html
<article class="article column large-12 small-12 article--nyheter">
<a class="article__link" href="/nyheter/14343208/">
<div class="article__content">
<h2 class="article__title t54 tm24">Person har falt ned bratt terreng - luftambulanse er på vei</h2>
</div>
</a>
</article>
<article class="article column large-6 small-6 article--nyheter">
<a class="article__link" href="/nyheter/14341466/">
<figure class="image image__responsive" style="padding-bottom:42.075%;">
<img class="image__img lazyload" itemprop="image" title="" alt="" src="data:image/gif;base64,R0lGODlhEAAJAIAAAP///wAAACH5BAEAAAAALAAAAAAQAAkAAAIKhI+py+0Po5yUFQA7" />
</figure>
<div class="article__content">
<h2 class="article__title t34 tm24">Vil styrke innsatsen mot vold i nære relasjoner</h2>
</div>
</a>
</article>
The thing is that I want to get only those html tags, in this case article tags, which has a child img tag inside them.
I have this sed command
sed -n '/<article class.*article--nyheter/,/<\/article>/p' onlyArticlesWithOutSpace.html > test.html
Now what I am trying ti achieve is to get only those article tags which has img tag inside them.
Output I want would be this
<article class="article column large-6 small-6 article--nyheter">
<a class="article__link" href="/nyheter/14341466/">
<figure class="image image__responsive" style="padding-bottom:42.075%;">
<img class="image__img lazyload" itemprop="image" title="" alt="" src="data:image/gif;base64,R0lGODlhEAAJAIAAAP///wAAACH5BAEAAAAALAAAAAAQAAkAAAIKhI+py+0Po5yUFQA7" />
I cannot use any xml/html parser. Just looking to use sed, grep, awk etc.
</figure>
<div class="article__content">
<h2 class="article__title t34 tm24">Vil styrke innsatsen mot vold i nære relasjoner</h2>
</div>
</a>
</article>