I have a html file in which a table has a number of rows.A TR tag may have its corresponding /TR in a another line .For example a.html file has the following.
<TABLE BORDER=1><TR><TH>col1</TH><TH>col2</TH><TH>col3</TH><TH>col4</TH></TR><TR><TD>aaa</TD><TD>bbb</TD><TD>ccc</TD><TD>ddd</TD></TR><TR><TD>eee</TD><TD>fff</TD><TD>ccc</TD><TD>mmm</TD></TR><TR><TD>jjj</TD><TD>kkk</TD><TD>lll</TD><TD>ssss</TD></TR>.........</TABLE>
Now i need to extract the contents between tr and /tr tags(inclusive) into another html file based on the value of td that is found between the tr and /tr.
For example from the a.html file i need to create b.html which only has the rows in which third column value is "ccc",provided a.html remains the same.
<TR><TD>aaa</TD><TD>bbb</TD><TD>ccc</TD><TD>ddd</TD></TR><TR><TD>eee</TD><TD>fff</TD><TD>ccc</TD><TD>mmm</TD></TR>
i am newbie and have only a little idea abt sed and awk. can anyone help me to get this done or suggest a better way so that it can be done easily.