1

I got the page http://www.cpubenchmark.net/cpu_list.php and I want to extract given CPU's with their Name, Rank and benchmark score.

Example ("Intel Core i5"):

Intel Core i5-3450 @ 3.10GHz - Score: 3333 - Rank: 1
Intel Core i5-3450S @ 2.80GHz - Score: 2222 - Rank: 2
Intel Core i5-2380P @ 3.10GHz - Score: 1111 - Rank: 3
...

How can I do that in bash? Tried to start with something like that (without cpu filtering - don't know how that works):

#!/bin/sh
curl http://www.cpubenchmark.net/cpu_list.php | grep '^<TR><TD>' \
| sed \
    -e 's:<TR>::g'  \
    -e 's:</TR>::g' \
    -e 's:</TD>::g' \
    -e 's:<TD>: :g' \
| cut -c2- >> /home/test.txt

Output is something like that:

<A HREF="cpu_lookup.php?cpu=686+Gen&amp;id=1495">686 Gen</A> 288 1559 NA NA
<A HREF="cpu_lookup.php?cpu=AMD+A10-4600M+APU&amp;id=10">AMD A10-4600M APU</A> 3175 388 NA NA
<A HREF="cpu_lookup.php?cpu=AMD+A10-4655M+APU&amp;id=11">AMD A10-4655M APU</A> 3017 406 NA NA
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
recon
  • 59
  • 1
  • 8

2 Answers2

4

If you want to download an additional program you can use my Xidel:

All CPUs:

xidel http://www.cpubenchmark.net/cpu_list.php -e '//table[@id="cputable"]//tr/concat(td[1], " - Score: ", td[2], " - Rank: ", td[3])'

Those starting with Intel...:

xidel http://www.cpubenchmark.net/cpu_list.php -e '//table[@id="cputable"]//tr[starts-with(td[1], "Intel Core i5")]/concat(td[1], " - Score: ", td[2], " - Rank: ", td[3])'

It can even sort them for rank (never used that feature before):

xidel http://www.cpubenchmark.net/cpu_list.php -e 'for $row in //table[@id="cputable"]//tr[starts-with(td[1], "Intel Core i5")] order by $row/td[3] return $row/concat(td[1], " - Score: ", td[2], " - Rank: ", td[3])' --extract-kind=xquery
BeniBela
  • 16,412
  • 4
  • 45
  • 52
  • That's amazing. Never heard about Xidel. Thanks for that nice solution. Works as expected. – recon Dec 29 '12 at 00:29
  • 1
    Well, I had it sitting as fifth-complete library on my harddisk for almost four years, before I told anyone about it... – BeniBela Dec 29 '12 at 00:33
0

A bash solution tailored strictly to the current format of the page:

#! /bin/bash

function nextcell
{
    cell=${line%%</TD>*}
    # remove closing link tag if any
    cell=${cell%</?>}
    cell=${cell##*>}
    line=${line#*</TD>}
}

while read line
do
    if [[ ! "$line" =~ cpu_lookup.php ]]
    then
        continue
    fi
    nextcell
    echo -n "$cell"
    nextcell
    echo -n " - Score: $cell"
    nextcell
    echo " - Rank: $cell"
done
Jester
  • 56,577
  • 4
  • 81
  • 125