How to find specific values in a table from a HTML file and print them with a linux command

Question

If i have the following table stored in a file. I want to run a linux command that extracts certain data from the table.

<td class="left"><a title="HSBC Holdings PLC" href="http://tools.morningstar.co.uk/t92wz0sj7c/@REPORT/default.aspx?externalid=GB0005405286&externalidtype=ISINMIC&externalidmic=XLON">HSBC Holdings PLC</a></td><td>716.60</td><td>30.30</td><td>4.41</td><td>686.40</td><td>^M

How can i run a command that prints just the name of the company and the price which is the first number 716.60

I have tried using sed but I cannot get it to work

The canonical explanation why you shouldn't try to use regular expressions to parse HTML is "[RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)". General HTML is not parsable by regular expressions. — AlexP, Jun 29 '17 at 09:45

score 0 · Answer 1 · answered Jun 29 '17 at 10:47

0

It's stupid but it works

cat -n sample.txt | awk -v FS="(</a></td><td>|</td>)" '{print $2}'

answered Jun 29 '17 at 10:47

arhu

62
3
11

How to find specific values in a table from a HTML file and print them with a linux command

1 Answers1