0

I am trying to trim down an output in some code I'm working on, and for whatever reason can't get it to work.

version= wget --output-document=- https://dolphin-emu.org/download 2>/dev/null \ | grep 'version always-ltr' -m 1
until [[ "${version::2}" == "." ]];
    do version= echo "$version" | sed 's/^.//'
done
until [[ "${version: -1}" -ge "0" ]];
    do version= echo "$version" | sed 's/.$//'
done
echo $version

Initially, $version equals something long and clunky:

<td class="version always-ltr"><a href="/download/dev/8ecfa537a242de74d2e372e30d9d79b14584b2fb/">5.0-16101</a></td>

However, I only want the 5.0-xxxxx number. How do I do that? (Or what absolutely idiotic mistake am I making?)

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • 2
    wrong format for assigning the output from a command to a variable; try `version=$(echo "$version" | sed 's/^.//')` (no spaces on either side of the `=`); there are other ways to extract the desired number but see if you can get your current code working first ... and assuming it now works, please update the question with a) the latest version of your code and b) the (wrong?) output generated by your code – markp-fuso Apr 26 '22 at 20:23
  • Perhaps `wget -q -O- https://dolphin-emu.org/download | sed -n 's~.*version always-ltr.*>\(.*\)$~\1~p'` – M. Nejat Aydin Apr 26 '22 at 20:48

2 Answers2

0

Setting aside syntax issues with the current code, assuming a HTML/XML knowledgeable tool (eg, xmllint) is not available, and assuming the contents of $version always looks like OP's example (eg, no embedded linefeeds) ...

One idea using a bash regex test and pulling the vesrion from the BASH_REMATCH[] array:

$ regex='>([^<>]+)<'             # '>' followed by 1+ characters, other than '<' and '>', followed by '<'

$ [[ "${version}" =~ $regex ]] && version_num="${BASH_REMATCH[1]}"
$ echo "${version_num}"
5.0-16101

$ typeset -p BASH_REMATCH
declare -ar BASH_REMATCH=([0]=">5.0-16101<" [1]="5.0-16101")
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
0

If as you show your version is of the form:

version='<td class="version always-ltr"><a href="/download/dev/8ecfa537a242de74d2e372e30d9d79b14584b2fb/">5.0-16101</a></td>'

A simple sed expression capturing the wanted value and reinserting as the first backreference is all that is needed, e.g.

$ echo "$version" | sed 's/^.*">\([^<][^<]*\).*$/\1/'
5.0-16101

Where you can rely on the greedy match from the beginning of the string to the final "> and then capture the wanted text with \([^<][^<]*\) and then reinsert it as the substituted text with \1.

To capture in a variable, just use command substitution, e.g. var=$(command), e.g.

ver=$(echo "$version" | sed 's/^.*">\([^<][^<]*\).*$/\1/')

Note: processing html should be done with an html/xml aware application like xmllint or xmlstarlet. There are far too many variations and caveats in what you may get back with curl to rely solely on shell processing to extract data consistently.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85