I have the following HTML strcuture and want to extract data from it using the awk
.
<body>
<div>...</div>
<div>...</div>
<div class="body-content">
<div>...</div>
<div class="product-list" class="container">
<div class="w3-row" id="product-list-row">
<div class="w3-col m2 s4">
<div class="product-cell">
<div class="product-title">Product A</div>
<div class="product-price">100,56</div>
</div>
</div>
<div class="w3-col m2 s4">
<div class="product-cell">
<div class="product-title">Product B</div>
<div class="product-price">200,56</div>
</div>
</div>
<div class="w3-col m2 s4">
<div class="product-cell">
<div class="product-title">Product C</div>
<div class="product-price">300,56</div>
</div>
</div>
<div class="w3-col m2 s4">
<div class="product-cell">
<div class="product-title">Product D</div>
<div class="product-price">400,56</div>
</div>
</div>
</div>
</div>
</div>
</body>
The result I want to have is as follows.
100,56
200,56
300,56
400,56
I was experimenting with the following awk
script (I know it makes no sense to select product-price
twice, I was about to modify this script)
awk -F '<[^>]+>' 'found { sub(/^[[:space:]]*/,";"); print title $0; found=0 } /<div class="product-price">/ { title=$2 } /<div class="product-price">/ { found=1 }'
but it gives me the result
100,56 </div>
200,56 </div>
300,56 </div>
400,56 </div>
I never used awk
before, so can't just figure out what is wrong here or how to modify the above code. How would you do this?