I have just started learning sed. I want to extract and print the characters between the > and < delimiters. Here the text in my data file:
<span id="ctl00_ContentPlaceHolder1_lblRollNo">12029</span>
<br /><b>Engineering & IT/Computer Science</b><br />
<div id="ctl00_ContentPlaceHolder1_divEngITMerit">
<span id="ctl00_ContentPlaceHolder1_lblEngITSelListNo">3rd Provisional Selection List</span>
<tr><td style='width: 200px' class='TblTRData'>IT/Computer Science/Software</td><td style='width: 150px'class='TblTRData'>7 (out of 471)</td><td style='width: 325px'class='TblTRData'>Selected in MS COMPUTER SCIENCE</td></tr>
Name:
<span id="ctl00_ContentPlaceHolder1_lblName">SIDRA SHAHID</span>
Father Name:
<span id="ctl00_ContentPlaceHolder1_lblFatherName">SHAHID RAFEEQ AHMAD</span>
I have written the command:
sed -n -e '/^[^>]*>\([^<]*\)<.*/s//\1/p' myfile.txt
The problem is that it is returning the text between some of the > <. For example, it prints 12029, but not Selected in Selected in MS COMPUTER SCIENCE. What am I doing wrong?