0

I'm trying to make a linux bash script to download an html page, extract numbers from this html page and assign them to a variable.

the html page has several lines but I'm interested in these :

<tr>
      <td width="16"><img src="img/ico_message.gif"></td>
      <td width="180"><strong> TIME 1</strong></td>
      <td width="132">
        <div align="right"><strong>61</strong></div></td>
    </tr>
    <tr>
      <td width="16"><img src="img/ico_message.gif"></td>
      <td width="180"><strong> TIME 2</strong></td>
      <td width="132">
        <div align="right"><strong>65</strong></div></td>
    </tr>
  </table></td>

Every time I download the page I have to read the two values ​​in row 5 and 11 between strong> and </strong (61 ad 65 in this example; 61 and 65 in this example, but each time they are different)

The two values ​​extracted from html must be able to assign them to two variables

Thanks for any idea

  • 2
    Add the command combinations which have tried so far. – nandal Jun 28 '18 at 11:09
  • 1
    Bash is not the right tool for the job. I'd use an HTML-aware tool ([xsh](http://metacpan.org/pod/distribution/XML-XSH2/xsh) in my case) if the markup isn't too broken, or [HTML::TableExtract](http://p3rl.org/HTML::TableExtract) in Perl. – choroba Jun 28 '18 at 11:16
  • 2
    You should use an `xpath` utility to parse xml/html. There are command line `xpath` tools you can invoke from a bash script. – ccarton Jun 28 '18 at 11:22
  • 1
    Welcome to Stack Overflow! Sorry, this is not the way StackOverflow works. Questions of the form "I want to do X, please give me tips and/or sample code" are considered off-topic. Please visit the [help] and read [ask], and especially read [Why is “Can someone help me?” not an actual question?](http://meta.stackoverflow.com/q/284236) – kvantour Jun 28 '18 at 11:58
  • 2
    Have a look at [this](https://stackoverflow.com/a/50713910/8344060) answer which shows how to extract links from an html using Xpath. And look at [this](http://www.zvon.org/xxl/XPathTutorial/General/examples.html) page to understand Xpath. With these two, I am 100% convinced you can do it ;-). If you still don't manage, please post your efforts here and we gladly help you out. – kvantour Jun 28 '18 at 12:02

2 Answers2

0

Let's assume we a page called page.html. You can firstly select the line with grep, then extract the value with sed and finally select values iteratively with awk:

$ var0=$(cat page.html |\
    grep -Ee "<strong>[0-9]+</strong>" -o |\
     sed  -Ee "s/<strong>([0-9]+)<\/strong>/\1/g" |\
      awk 'NR%2==1')

$ var1=$(cat page.html |\
    grep -Ee "<strong>[0-9]+</strong>" -o |\
     sed  -Ee "s/<strong>([0-9]+)<\/strong>/\1/g" |\
      awk 'NR%2==0')

output:

$ echo $var0
61
$ echo $var1
65
Ulises Rosas-Puchuri
  • 1,900
  • 10
  • 12
0

This might work for you (GNU sed):

sed -rn '/TIME/{:a;N;5bb;11bb;ba;:b;s/.*TIME ([^<]*).*<strong>([^<]*).*/var\1=\2/p}' file

Use the integer associated with the TIME in the preceding code to differentiate the two variable names.

potong
  • 55,640
  • 6
  • 51
  • 83