3

I am currently working on a language that aims to compile to POSIX shell languages and I want to introduce a pop feature. Just like how you can use "shift" to remove the first argument passed to a function:

f() {
  shift
  printf '%s' "$*"
}

f 1 2 3 #=> 2 3

I want some code that when introduced below can remove the last argument.

g() {
  # pop
  printf '%s' "$*"
}

g 1 2 3 #=> 1 2

I am aware of the array method as detailed in (Remove last argument from argument list of shell script (bash)), but I want something portable that will work in at least the following shells: ash, dash, ksh (Unix), bash, and zsh. I also want something reasonably speedy; something that opens external processes/subshells would be too heavy for small argument counts, thought if you have a creative solution I wouldn't mind seeing it regardless (and they can still be used as a fallback for large argument counts). Something as fast as those array methods would be ideal.

phicr
  • 1,332
  • 9
  • 16

3 Answers3

4

This is my current answer:

pop() {
  local n=$(($1 - ${2:-1}))
  if [ -n "$ZSH_VERSION" -o -n "$BASH_VERSION" ]; then
    POP_EXPR='set -- "${@:1:'$n'}"'
  elif [ $n -ge 500 ]; then
    POP_EXPR="set -- $(seq -s " " 1 $n | sed 's/[0-9]\+/"${\0}"/g')"
  else
    local index=0
    local arguments=""
    while [ $index -lt $n ]; do
      index=$((index+1))
      arguments="$arguments \"\${$index}\""
    done
    POP_EXPR="set -- $arguments"
  fi
}

Note that local is not POSIX, but since all major sh shells support it (and specifically the ones I asked for in my question) and not having it can cause serious bugs, I decided to include it in this leading function. But here's a fully compliant POSIX version with obfuscated arguments to reduce the chance of bugs:

pop() {
  __pop_n=$(($1 - ${2:-1}))
  if [ -n "$ZSH_VERSION" -o -n "$BASH_VERSION" ]; then
    POP_EXPR='set -- "${@:1:'$__pop_n'}"'
  elif [ $__pop_n -ge 500 ]; then
    POP_EXPR="set -- $(seq -s " " 1 $__pop_n | sed 's/[0-9]\+/"${\0}"/g')"
  else
    __pop_index=0
    __pop_arguments=""
    while [ $__pop_index -lt $__pop_n ]; do
      __pop_index=$((__pop_index+1))
      __pop_arguments="$__pop_arguments \"\${$__pop_index}\""
    done
    POP_EXPR="set -- $__pop_arguments"
  fi
}

Usage

pop1() {
  pop $#
  eval "$POP_EXPR"
  echo "$@"
}

pop2() {
  pop $# 2
  eval "$POP_EXPR"
  echo "$@"
}

pop1 a b c #=> a b
pop1 $(seq 1 1000) #=> 1 .. 999
pop2 $(seq 1 1000) #=> 1 .. 998

pop_next

Once you've created the POP_EXPR variable with pop, you can use the following function to change it to omit further arguments:

pop_next() {
  if [ -n "$BASH_VERSION" -o -n "$ZSH_VERSION" ]; then
    local np="${POP_EXPR##*:}"
    np="${np%\}*}"
    POP_EXPR="${POP_EXPR%:*}:$((np == 0 ? 0 : np - 1))}\""
    return
  fi
  POP_EXPR="${POP_EXPR% \"*}"
}

pop_next is a much simpler operation than pop in posix shells (though it's slightly more complex than pop on zsh and bash)

It's used like this:

main() {
  pop $#
  pop_next
  eval "$POP_EXPR"
}

main 1 2 3 #=> 1

POP_EXPR and variable scope

Note that if you're not going to be using eval "$POP_EXPR" immediately after pop and pop_next, if you're not careful with scoping some function call inbetween the operations could change the POP_EXPR variable and mess things up. To avoid this, simply put local POP_EXPR at the start of every function that uses pop, if it's available.

f() {
  local POP_EXPR
  pop $#
  g 1 2
  eval "$POP_EXPR"
  printf '%s' "f=$*"
}

g() {
  local POP_EXPR
  pop $#
  eval "$POP_EXPR"
  printf '%s, ' "g=$*"
}

f a b c #=> g=1, f=a b

popgen.sh

This particular function is good enough for my purposes, but I did create a script to generate further optimized functions.

https://gist.github.com/fcard/e26c5a1f7c8b0674c17c7554fb0cd35c#file-popgen-sh

One of the ways to improve performance without using external tools here is to realize that having several small string concatenations is slow, so doing them in batches makes the function considerably faster. calling the script popgen.sh -gN1,N2,N3 creates a pop function that handles the operations in batches of N1, N2, or N3 depending on the argument count. The script also contains other tricks, exemplified and explained below:

$ sh popgen  \
>  -g 10,100 \ # concatenate strings in batches\
>  -w        \ # overwrite current file\
>  -x9       \ # hardcode the result of the first 9 argument counts\
>  -t1000    \ # starting at argument count 1000, use external tools\
>  -p posix  \ # prefix to add to the function name (with a underscore)\
>  -s ''     \ # suffix to add to the function name (with a underscore)\
>  -c        \ # use the command popsh instead of seq/sed as the external tool\
>  -@        \ # on zsh and bash, use the subarray method (checks on runtime)\
>  -+        \ # use bash/zsh extensions (removes runtime check from -@)\
>  -nl       \ # don't use 'local'\
>  -f        \ # use 'function' syntax\
>  -o pop.sh   # output file

An equivalent to the above function can be generated with popgen.sh -t500 -g1 -@. In the gist containing popgen.sh you will find a popsh.c file that can be compiled and used as a specialized, faster alternative to the default shell external tools, it will be used by any function generated with popgen.sh -c ... if it's accessible as popsh by the shell. Alternatively, you can create any function or tool named popsh and use it in its place.

Benchmark

Benchmark functions:

The script I used for benchmarking can be found on this gist: https://gist.github.com/fcard/f4aec7e567da2a8e97962d5d3f025ad4#file-popbench-sh

The benchmark functions are found in these lines: https://gist.github.com/fcard/f4aec7e567da2a8e97962d5d3f025ad4#file-popbench-sh-L233-L301

The script can be used as such:

$ sh popbench.sh   \
>   -s dash        \ # shell used by the benchmark, can be dash/bash/ash/zsh/ksh.\
>   -f posix       \ # function to be tested\
>   -i 10000       \ # number of times that the function will be called per test\
>   -a '\0'        \ # replacement pattern to model arguments by index (uses sed)\
>   -o /dev/stdout \ # where to print the results to (concatenates, defaults to stdout)\
>   -n 5,10,1000     # argument sizes to test

It will output a time -p style sheet with a real, user and sys time values, as well as an int value, for internal, that is calculated inside the benchmark process using date.

Times

The following are the int results of calls to

$ sh popbench.sh -s $shell -f $function -i 10000 -n 1,5,10,100,1000,10000

posix refers to the second and third clauses, subarray refers to the first, while final refers to the whole.

value count           1           5          10         100        1000        10000
---------------------------------------------------------------------------------------
dash/final        0m0.109s    0m0.183s    0m0.275s    0m2.270s   0m16.122s   1m10.239s
ash/final         0m0.104s    0m0.175s    0m0.273s    0m2.337s   0m15.428s   1m11.673s
ksh/final         0m0.409s    0m0.557s    0m0.737s    0m3.558s   0m19.200s   1m40.264s
bash/final        0m0.343s    0m0.414s    0m0.470s    0m1.719s   0m17.508s   3m12.496s
---------------------------------------------------------------------------------------
bash/subarray     0m0.135s    0m0.179s    0m0.224s    0m1.357s   0m18.911s   3m18.007s
dash/posix        0m0.171s    0m0.290s    0m0.447s    0m3.610s   0m17.376s    1m8.852s
ash/posix         0m0.109s    0m0.192s    0m0.285s    0m2.457s   0m14.942s   1m10.062s
ksh/posix         0m0.416s    0m0.581s    0m0.768s    0m4.677s   0m18.790s   1m40.407s
bash/posix        0m0.409s    0m0.739s    0m1.145s   0m10.048s   0m58.449s  40m33.024s

On zsh

For large argument counts setting set -- ... with eval is very slow on zsh no matter no matter the method, save for eval 'set -- "${@:1:$# - 1}"'. Even as simple a modification as changing it to eval "set -- ${@:1:$# - 1}" (ignoring that it doesn't work for arguments with spaces) makes it two orders of magnitude slower.

value count           1           5          10         100        1000        10000
---------------------------------------------------------------------------------------
zsh/subarray      0m0.203s    0m0.227s    0m0.233s    0m0.461s    0m3.643s   0m38.396s
zsh/final         0m0.399s    0m0.416s    0m0.441s    0m0.722s    0m4.205s   0m37.217s
zsh/posix         0m0.718s    0m0.913s    0m1.182s    0m6.200s   0m46.516s  42m27.224s
zsh/eval-zsh      0m0.419s    0m0.353s    0m0.375s    0m0.853s    0m5.771s  32m59.576s

More benchmarks

For more benchmarks, including only using external tools, the c popsh tool or the naive algorithm, see this file:

https://gist.github.com/fcard/f4aec7e567da2a8e97962d5d3f025ad4#file-benchmarks-md

It's generated like this:

$ git clone https://gist.github.com/f4aec7e567da2a8e97962d5d3f025ad4.git popbench
$ cd popbench
$ sh popgen_run.sh
$ sh popbench_run.sh --fast # or without --fast if you have a day to spare
$ sh poptable.sh -g >benchmarks.md

Conclusion

This has been the result of a week-long research on the subject, and I thought I'd share it. Hopefully it's not too long, I tried to trim it to the main information with links to the gist. This was initially made as an answer to (Remove last argument from argument list of shell script (bash)) but I felt the focus on POSIX made it off topic.

All the code in the gists linked here is licensed under the MIT license.

phicr
  • 1,332
  • 9
  • 16
  • `local` isn't part of POSIX. – chepner Sep 12 '20 at 21:41
  • True, which is why I have a option in popgen.sh to not use it. `local` is on all the shells I require as mentioned, including unix ksh, which is why my function uses it, but perhaps for the sake of this answer I should remove it. – phicr Sep 12 '20 at 21:50
  • 1
    Frankly, I didn't read that far. This is really beyond the scope of a Stack Overflow question; it's more like a blog post. – chepner Sep 12 '20 at 21:51
  • I am aware the answer is long, bit it's still fully devoted to answer a simple question. (that's been asked before, plus the added requirement of posix & speed) I don't see the harm in having a detailed answer, specially since I've made a point of having the part I know most people will be interested in (the code) at the top. – phicr Sep 12 '20 at 22:27
  • 1
    I've organized the info and code here into a proper github repo: https://github.com/fcard/pop.sh; If people really think that there's too much info here, I can now link to it instead where appropriate. – phicr Sep 13 '20 at 19:05
2
alias pop='set -- $(eval printf '\''%s\\n'\'' $(seq $(expr $# - 1) | sed '\''s/^/\$/;H;$!d;x;s/\n/ /g'\'') )'

EDIT:

this is a POSIX shell solution that use aliases instead of functions; if called in a function, this gives the desired effect (it resets the function arguments by using the same number of arguments minus the last; being an alias, and with eval, it can change the values of the enclosing function):

func () {
    pop
    pop
    echo "$@"
}
func a b c d e      # prints a b c
phranz
  • 71
  • 5
  • could you please explain the execution flow of this alias or at least the `sed` part? – saulius2 Dec 16 '22 at 20:16
  • 1
    @saulius2 the sed part simply translates the column of numbers generated by seq into a row of all positional parameters except the last by replacing newlines with spaces and prepending a '$' to each number. This can be achieved in a multitude of other ways. – phranz Jan 17 '23 at 15:13
1
pop () {
    i=0
    while [ $((i+=1)) -lt $# ]; do
        set -- "$@" "$1"
        shift
    done # 1 2 3 -> 3 1 2
    printf '%s' "$1" # last argument
    shift # $@ is now without last argument
}