A html document can have multiple tags like below:
<h2>
<a id="id1" name="name1"></a>
test1
</h2>
<h2>
<a id="id2" name="name2"></a>
test2
</h2>
I am iterating over all <h2>
tag in document to get the inner html of <h2>
using awk
like below:
file='/var/www/html/test.html'
awk -F" *</?h2> *\n?" -v RS="^$" '{
for(i=2;i<=NF;i+=2)
{
printf "%s", $i
//parse to get the 'id' and 'text'
arr['id']=value //need to do something here
}
}' $file
and i am getting output like:
<a id="id1" name="name1"></a>
test1
<a id="id2" name="name2"></a>
test2
Now, i want to parse the anchor inside awk loop to get the id
as key and description(for ex: test1) as value.
So, that if i access array as ${arr[@]}
outside the awk
i should get the below output something like:
{'id1':'test1','id2':'test2'}