I want to scratch the verion number and update date from the Steam app. The version & date is in the "About this app" dialog. The current version as well as from the Steam app is 2.3.13 and was updated on Jun 1, 2021. Here is the link: play.google.com/store/apps/…
This is one terrible website to scrape and a fragile endeavour at that. One small update on the website on their part and you can start all over again. But still, at the moment with a tool like xidel (an XML/HTML/JSON parser) it is doable. With tools like sed
(regex) I think this would be next to impossible. Please see 1732454, 590747 and 6751105 for example on why it's a bad idea to parse HTML with regular expressions.
Okay. It looks like the date is the only thing that you can easily get from a <div>
-node (I don't see the version-string in the "About this app"-dialog):
$ xidel -s "https://play.google.com/store/apps/details?id=com.valvesoftware.android.steam.community" \
-e '//div[@class="xg1aie"]'
Jun 1, 2021
The other stuff (including the date again) can be found in a <script>
-node where the text-node is a (rather complicated) "pseudo" JSON:
$ xidel -s "https://play.google.com/store/apps/details?id=com.valvesoftware.android.steam.community" \
-e '//body/script[15]' \
--output-node-format=xml
A lot of <script>
-nodes have the same nonce
-attribute, without any other identifiable attribute, so we just have to select the one we're after; the 15th.
After removing the javascript-code, xidel
, as a JSON-parser, can parse the JSON:
$ xidel -s "https://play.google.com/store/apps/details?id=com.valvesoftware.android.steam.community" -e '
parse-json(
extract(//body/script[15],"AF_initDataCallback\((.+)\);",1),
{"liberal":true()}
)
'
Then you can grab the stuff you want from this JSON and do a string-concatenation, like for instance:
$ xidel -s "https://play.google.com/store/apps/details?id=com.valvesoftware.android.steam.community" -e '
parse-json(
extract(//body/script[15],"AF_initDataCallback\((.+)\);",1),
{"liberal":true()}
)/(data)(2)(3)/x"{.(1)()} {.(141)(1)()()} ({.(141)(3)()()})"
'
Steam 2.3.13 (Jun 1, 2021)
Also here, it's rather fragile, having to select the 141th element of an array because there's no other identifiable way.
x"..."
is xidel
's own "extended string syntax". With XPath's concat-filter that would be:
$ xidel -s "https://play.google.com/store/apps/details?id=com.valvesoftware.android.steam.community" -e '
parse-json(
extract(//body/script[15],"AF_initDataCallback\((.+)\);",1),
{"liberal":true()}
)/(data)(2)(3)/concat(
.(1)()," ",.(141)(1)()(),
" (",.(141)(3)()(),")"
)
'
Steam 2.3.13 (Jun 1, 2021)
As a bonus you can also grab the Epoch timestamp instead:
$ xidel -s "https://play.google.com/store/apps/details?id=com.valvesoftware.android.steam.community" -e '
parse-json(
extract(//body/script[15],"AF_initDataCallback\((.+)\);",1),
{"liberal":true()}
)/(data)(2)(3)/concat(
.(1)()," ",.(141)(1)()(),
" (",.(146)()(2)(1) * duration("PT1S") + dateTime("1970-01-01T00:00:00Z"),")"
)
'
Steam 2.3.13 (2021-06-01T17:33:25Z)