0

If i have the following table stored in a file. I want to run a linux command that extracts certain data from the table.

<td class="left"><a title="HSBC Holdings PLC" href="http://tools.morningstar.co.uk/t92wz0sj7c/@REPORT/default.aspx?externalid=GB0005405286&externalidtype=ISINMIC&externalidmic=XLON">HSBC Holdings PLC</a></td><td>716.60</td><td>30.30</td><td>4.41</td><td>686.40</td><td>^M

How can i run a command that prints just the name of the company and the price which is the first number 716.60

I have tried using sed but I cannot get it to work

B.Hunt
  • 11
  • 3
  • 3
    use html parser, not regex – Sundeep Jun 29 '17 at 09:35
  • The canonical explanation why you shouldn't try to use regular expressions to parse HTML is "[RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)". General HTML is not parsable by regular expressions. – AlexP Jun 29 '17 at 09:45
  • post extended structure with `table` tag – RomanPerekhrest Jun 29 '17 at 10:38

1 Answers1

0

It's stupid but it works

cat -n sample.txt | awk -v FS="(</a></td><td>|</td>)" '{print $2}'
arhu
  • 62
  • 3
  • 11