34

i want to compute all *bin files inside a given directory. Initially I was working with a for-loop:

var=0
for i in *ls *bin
do
   perform computations on $i ....
   var+=1
done
echo $var

However, in some directories there are too many files resulting in an error: Argument list too long

Therefore, I was trying it with a piped while-loop:

var=0
ls *.bin | while read i;
do
  perform computations on $i
  var+=1
done
echo $var

The problem now is by using the pipe subshells are created. Thus, echo $var returns 0.
How can I deal with this problem?
The original Code:

#!/bin/bash

function entropyImpl {
    if [[ -n "$1" ]]
    then
        if [[ -e "$1" ]]
        then
            echo "scale = 4; $(gzip -c ${1} | wc -c) / $(cat ${1} | wc -c)" | bc
        else
            echo "file ($1) not found"
        fi
    else
        datafile="$(mktemp entropy.XXXXX)"
        cat - > "$datafile"
        entropy "$datafile"
        rm "$datafile"
    fi

    return 1
}
declare acc_entropy=0
declare count=0

ls *.bin | while read i ;
do  
    echo "Computing $i"  | tee -a entropy.txt
    curr_entropy=`entropyImpl $i`
    curr_entropy=`echo $curr_entropy | bc`  
    echo -e "\tEntropy: $curr_entropy"  | tee -a entropy.txt
    acc_entropy=`echo $acc_entropy + $curr_entropy | bc`
    let count+=1
done

echo "Out of function: $count | $acc_entropy"
acc_entropy=`echo "scale=4; $acc_entropy / $count" | bc`

echo -e "===================================================\n" | tee -a entropy.txt
echo -e "Accumulated Entropy:\t$acc_entropy ($count files processed)\n" | tee -a entropy.txt
codeforester
  • 39,467
  • 16
  • 112
  • 140
user1192748
  • 945
  • 3
  • 15
  • 26
  • A `for` loop is evaluated in the shell, and thus by itself does not produce "argument list too long". Perhaps you can go back to that code and fix whatever else was wrong there. (The `*ls` looks mirplaced there; perhaps the original problem was a [useless use of `ls`](http://www.iki.fi/era/unix/award.html#ls)?) – tripleee Sep 18 '16 at 16:33

4 Answers4

87

The problem is that the while loop is part of a pipeline. In a bash pipeline, every element of the pipeline is executed in its own subshell [ref]. So after the while loop terminates, the while loop subshell's copy of var is discarded, and the original var of the parent (whose value is unchanged) is echoed.

One way to fix this is by using Process Substitution as shown below:

var=0
while read i;
do
  # perform computations on $i
  ((var++))
done < <(find . -type f -name "*.bin" -maxdepth 1)

Take a look at BashFAQ/024 for other workarounds.

Notice that I have also replaced ls with find because it is not good practice to parse ls.

nishanthshanmugham
  • 2,967
  • 1
  • 25
  • 29
dogbane
  • 266,786
  • 75
  • 396
  • 414
  • 2
    This is correct for bash, zsh and ksh, but is not POSIX compliant. For example, set bash in POSIX mode `set -o posix`, and try such a command. You get : ```syntax error near unexpected token `<'``` – Dunatotatos Sep 18 '16 at 15:12
  • 13
    To say "the while loop is executed in a subshell" is correct but somewhat misleading in this context -- one might assume loops are generally in a subshell which is not the case. The issue here is that *any* command on the right hand side of a pipe would be in a subshell. That we have a compound command, namely a loop, is just a coincidence. (Although that coincidence is where the subshell issue manifests itself because only a compound command provides the opportunity to assign values to variables which are not persisted.) – Peter - Reinstate Monica Jun 07 '17 at 09:04
  • Great answer, saved my life. While my code is little bit more complex I added an answer myself here: https://stackoverflow.com/questions/55239979/how-to-avoid-subshell-behaviour-using-while-and-find/55239981#55239981 – ingobaab Mar 19 '19 at 11:30
  • I think ";" is not needed at the end of the "while read i" – Ellis Aug 14 '21 at 14:38
  • 1
    @Peter-ReinstateMonica: You are correct. But your comment applies not just for "right hand side of a pipe". In bash, _every_ element of a pipeline, including the first element (i.e. not on the right hand side), is executed in its own subshell. For example: https://ideone.com/X41sYJ. – nishanthshanmugham Oct 17 '22 at 22:12
  • This is black magic. Thank you for sharing. – EJK Jun 07 '23 at 03:21
21

A POSIX compliant solution would be to use a pipe (p file). This solution is very nice, portable, and POSIX, but writes something on the hard disk.

mkfifo mypipe
find . -type f -name "*.bin" -maxdepth 1 > mypipe &
while read line
do
    # action
done < mypipe
rm mypipe

Your pipe is a file on your hard disk. If you want to avoid having useless files, do not forget to remove it.

Dunatotatos
  • 1,706
  • 15
  • 25
  • 9
    `trap 'rm -rf $TMPFIFODIR' EXIT; TMPFIFODIR=$(mktemp -d); mkfifo $TMPFIFODIR/mypipe` at the beginning of the script, and reading/writing that fifo would take care of the "do not forget to remove it" issue. – Mike S Oct 10 '17 at 17:03
  • Alternatively, you can use an actual file, if you don't mind forcing everything to run sequentially, and capturing all the output in a file. – torek Oct 21 '18 at 16:39
4

So researching the generic issue, passing variables from a sub-shelled while loop to the parent. One solution I found, missing here, was to use a here-string. As that was bash-ish, and I preferred a POSIX solution, I found that a here-string is really just a shortcut for a here-document. With that knowledge at hand, I came up with the following, avoiding the subshell; thus allowing variables to be set in the loop.

#!/bin/sh

set -eu

passwd="username,password,uid,gid
root,admin,0,0
john,appleseed,1,1
jane,doe,2,2"

main()
{
    while IFS="," read -r _user _pass _uid _gid; do
        if [ "${_user}" = "${1:-}" ]; then
            password="${_pass}"
        fi
    done <<-EOT
        ${passwd}
    EOT

    if [ -z "${password:-}" ]; then
        echo "No password found."
        exit 1
    fi

    echo "The password is '${password}'."
}

main "${@}"

exit 0

One important note to all copy pasters, is that the here-document is setup using the hyphen, indicating that tabs are to be ignored. This is needed to keep the layout somewhat nice. It is important to note, because stackoverflow doesn't render tabs in 'code' and replaces them with spaces. Grmbl. SO, don't mangle my code, just cause you guys favor spaces over tabs, it's irrelevant in this case!

This probably breaks on different editor(settings) and what not. So the alternative would be to have it as:

    done <<-EOT
${passwd}
EOT
oliver
  • 761
  • 6
  • 4
-2

This could be done with a for loop, too:

var=0;
for file in `find . -type f -name "*.bin" -maxdepth 1`; do 
    # perform computations on "$i"
    ((var++))
done 
echo $var
tripleee
  • 175,061
  • 34
  • 275
  • 318
Thomas K.
  • 25
  • 4
  • No, this would produce the "argument list too long" error they were trying to avoid in the first place, and additionally break (possibly with serious security implications) on file names with whitespace. – tripleee Sep 18 '16 at 16:30
  • with find you won't get an "argument list too long" error, only with ls. – Thomas K. Sep 20 '16 at 07:55
  • 1
    I stand corrected, this seems to avoid the "argument list too long" error at least with reasonably recent versions of Bash. The whitespace error is still a problem. – tripleee Sep 20 '16 at 08:39
  • 1
    `for file in *.bin; do` won't produce an "argument list too long" error in bash either. So the find command doesn't bring anything extra to the party. The argument list too long error is from the kernel. Since `for` is a shell builtin, there is no exec'ing a command with a long argument list. See https://stackoverflow.com/questions/19354870/bash-command-line-and-input-limit/19355351 – Mike S Dec 29 '20 at 21:55
  • 1
    To clarify, my "shell builtin" sentence is better like this: Since `for` is a shell builtin, the shell will not exec a command, so the argument list limitation does not apply. – Mike S Dec 29 '20 at 22:38