Orignal question
My initial attempt was to run curl https://stackoverflow.com/users/5825294/enlico
and pipe the result into sed
/awk
. However, as I've frequently read, sed
and awk
are not the best tools to parse HTML code. Furthermore, the above URL changes if I change my user name.
Oh, this is my quick attempt with sed
, written on multiple lines for readability:
curl https://stackoverflow.com/users/5825294/enlico 2> /dev/null | sed -nE '
/title="reputation"/,/bronze badges/{
/"reputation"/{
N
N
s!.*>(.*)</.*!\1!p
}
/badges/s/.*[^1-9]([1-9]+[0-9]*,*[0-9]* (gold|silver|bronze) badges).*/\1/p
}'
which prints
10,968
5 gold badges
27 silver badges
56 bronze badge
Obviously this script heavily relies on the peculiar structure of the specific HTML page, the most notable example being that I run N
twice because I've verified that the reputation is two lines below the first line in the file containing "reputation"
.
Update based on the answers
Léa Gris' answer almost answers my question. The missing bit is that I have 5 gold, 27 silver, and 56 bronze badges, not 5, 18, 7.
In this respect, I've noticed that 18 is the is the number of silver badges I have if I don't consider those awarded multilple times, therefore I've played around with jq
and discovered that I can query for the award_count
beside the rank
, and I thought that I could use that to take multiply awarded badges into account. This kind of works, in the sense that running the following (fetch_user_badges
is from Léa Gris' answer) generates the correct number of silver badges but the wrong number of bronze badges:
$ fetch_user_badges stackoverflow 5825294 | jq -r '
.items
| map({rank: .rank, count: .award_count})
| group_by(.rank)
| map([[.[0].rank],map(.count) | add])'
[
[
"bronze",
22
],
[
"gold",
5
],
[
"silver",
27
]
]
Is anybody aware of why is that?