1

I am trying to parse an html table in order to obtain the values. See here.

    <tr>
            <th>CLI:</th>
            <td>0044123456789</td>
    </tr>

    <tr>
            <th>Call Type:</th>
            <td>New Enquiry</td>
    </tr>

    <tr>
            <th class=3D"nopaddingtop">Caller's Name:</th>
            <td class=3D"nopaddingtop">&nbsp;</td>
    </tr>

    <tr>
            <th class=3D"nopaddingmid"></th>
            <td class=3D"nopaddingmid">Mr</td>
    </tr>

    <tr>
            <th class=3D"nopaddingmid"></th>
            <td class=3D"nopaddingmid">Lee</td>
    </tr>

    <tr>
            <th class=3D"nopaddingbot"></th>
            <td class=3D"nopaddingbot">Butler</td>
    </tr>

I want to read the values associated wit the "CLI", "Call Type", and "Caller's Name" into separate variables using sed / awk.

For example:

cli="0044123456789"
call_type="New Enquiry"
caller_name="Mr Lee Butler"

How can I do this?

Many thanks, Neil.

Neil Reardon
  • 65
  • 1
  • 9
  • 1
    If it's valid HTML I recommend to use an XML-Parser like `xmllint`. – Cyrus Nov 15 '14 at 18:59
  • agree about using XML-Parser, but not clear if you want to just find `CLI` (etc) or the value(s) associated (`0044123456789`) ? Please update your question, rather than answering as a comment. Good luck. – shellter Nov 15 '14 at 19:13

1 Answers1

2

One example for CLI one :

var=$(xmllint --html --xpath '//th[contains(., "CLI")]/../td/text()' file.html)
echo "$var"

For the multi <tr> part :

$ for key in {4..6}; do
    xmllint \
        --html \
        --xpath "//th[contains(., 'CLI')]/../../tr[$key]/td/text()" file.html
    printf ' '
done
echo

Output:

Mr Lee Butler
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223