46

I am trying to find the pathname with the most characters in it. There might be better ways to do this. But I would like to know why this problem occurs.

LONGEST_CNT=0
find samples/ | while read line
do
    line_length=$(echo $line | wc -m)
    if [[ $line_length -gt $LONGEST_CNT ]] 
    then
        LONGEST_CNT=$line_length
        LONGEST_STR=$line
    fi
done

echo $LONGEST_CNT : $LONGEST_STR

It somehow always returns:

0 :

If I print the results for debugging inside the while loop the values are correct. So why bash does not make these variables global?

codeforester
  • 39,467
  • 16
  • 112
  • 140
Mark
  • 7,507
  • 12
  • 52
  • 88

4 Answers4

85

When you pipe into a while loop in Bash, it creates a subshell. When the subshell exits, all variables return to their previous values (which may be null or unset). This can be prevented by using process substitution.

LONGEST_CNT=0
while read -r line
do
    line_length=${#line}
    if (( line_length > LONGEST_CNT ))
    then
        LONGEST_CNT=$line_length
        LONGEST_STR=$line
    fi
done < <(find samples/ )    # process substitution

echo $LONGEST_CNT : $LONGEST_STR
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • 1
    The solution Dennis provided works, but be aware, that it violates POSIX norm. Try to 'set -o posix' and the script will not work! – RobSis Feb 29 '12 at 12:10
  • 2
    @Robert: The question is tagged [bash]. It is true that process substitution is not specified by POSIX. However, unless portability to POSIX-only shells is a concern, there's no reason not to use Bash-specific features. By the way, process substitution is supported by ksh and zsh. However, they don't create a subshell that causes the variable to be lost. Also note that Bash 4.2 has an option, `shopt -s lastpipe` which "runs the last command of a pipeline in the current shell context" (not a subshell) - unless job control is in affect. – Dennis Williamson Feb 29 '12 at 16:11
  • I use this, but get syntax error: `syntax error near unexpected token `done'` – Aswin Murugesh Nov 02 '15 at 07:35
  • @AswinMurugesh: Most likely, you are missing the `do`. – Dennis Williamson Nov 02 '15 at 12:28
  • 1
    *return to their previous values* ... well, IMHO better formulation would be that the changes happen in a temporary copy of environment (even different PID I think) that gets destroyed as the while loop is over. The original global variable is never touched at all; there's no reason to "return". – Alois Mahdal Dec 03 '15 at 19:17
20

The "correct" reply is given by Dennis. However, I find the process substitution trick extremely unreadable if the loop contains more than a few lines. When reading a script, I want to see what goes into the pipe before I see how it is processed.

So I usually prefer this trick of encapsulating the while loop in "{}".

LONGEST_CNT=0
find /usr/share/zoneinfo | \
{ while read -r line
    do
        line_length=${#line}
        if (( line_length > LONGEST_CNT ))
        then
            LONGEST_CNT=$line_length
            LONGEST_STR=$line
        fi
    done
    echo $LONGEST_CNT : $LONGEST_STR
}
Community
  • 1
  • 1
mivk
  • 13,452
  • 5
  • 76
  • 69
  • Totally agree about the readability point. +1. Still, it's the way to go. – 0xC0000022L Feb 06 '13 at 22:56
  • 2
    except that this example does not modify the global variable. if you check LONGEST_CNT **after** the loop, you will see it's still zero – Deim0s Dec 23 '14 at 17:22
  • @Deims0s : yes. That is why the echo statement is inside the "{}". If the variables are needed again later, the process substitution way might be better suited. – mivk Jul 10 '15 at 10:13
2

About finding the longest pathname. Here's an alternative:

find /usr/share/zoneinfo | while read line; do
    echo ${#line} $line 
done | sort -nr | head -n 1

# Result:
58 /usr/share/zoneinfo/right/America/Argentina/ComodRivadavia

Forgive me if this is considered off topic, I hope it helps someone.

grebneke
  • 4,414
  • 17
  • 24
  • Using `echo` to print the output from backicks is rarely useful or necessary. See [useless use of `echo`](http://partmaps.org/era/unix/award.html#echo). – tripleee Dec 27 '13 at 09:43
  • 1
    @tripleee: AFAICT both echos are useful in this example: wc needs the input from $line, and $line needs to be displayed after the output from wc. Please, feel free to suggest improvements without adding more lines or variables. – grebneke Dec 29 '13 at 01:36
  • 2
    @grebneke I upvoted your answer (and comment) but after thinking about it some more, I realised you could avoid using command substitution to run `wc`. Instead, you could use [POSIX shell parameter expansion](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02): `echo ${#line} $line;` – Anthony Geoghegan Nov 04 '15 at 11:31
  • 1
    @AnthonyGeoghegan nice suggestion. To be clear, your example uses [# for String Length](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02) as opposed to [Number of positional parameters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02) - I think you accidentally posted the wrong link. `#` is used for both and it can get confusing. – grebneke Nov 16 '15 at 11:06
  • @grebneke Good catch. I posted the wrong link while trying to be quick. :( – Anthony Geoghegan Nov 16 '15 at 11:11
  • 1
    @AnthonyGeoghegan updated with your solution, **way faster**, gj tyvm – grebneke Nov 16 '15 at 11:31
2

Do what you always (should) do:

  • separate concerns,
  • avoid globals,
  • document your code,
  • be readable,
  • maybe be POSIXy.

(Yes, I added a bit more "good practice" ingredients to the soup than absolutely necessary ;))

So my favorite "knee-jerk reaction" to problems with invisible subshell is to use function:

#!/bin/sh

longest() {
    #
    # Print length and body of the longest line in STDIN
    #
    local cur_ln    # current line
    local cur_sz    # current size (line length)
    local max_sz    # greatest size so far
    local winner    # longest string so far
    max_sz=0
    while read -r cur_ln
    do
        cur_sz=${#cur_ln}
        if test "$cur_sz" -gt "$max_sz";
        then
            max_sz=$cur_sz
            winner=$cur_ln
        fi
    done
    echo "$max_sz" : "$winner"
}

find /usr/share/zoneinfo | longest

# ok, if you really wish to use globals, here you go ;)
LONGEST_CNT=0
LONGEST_CNT=$(
    find /usr/share/zoneinfo \
      | longest \
      | cut -d: -f1 \
      | xargs echo\
)
echo "LONGEST_CNT='$LONGEST_CNT'"

Aside from avoiding the subshell annoyance, it gives you perfect place to document the code, and sort-of adds namespacing: notice that inside function you can use much shorter and simpler variable names without losing any of readability.

Alois Mahdal
  • 10,763
  • 7
  • 51
  • 69
  • 1
    The last part of this is the best answer really, but fyi you can put the pipes at the ends of the lines and avoid the line continuation backslashes. – Priv Acyplease Jan 28 '19 at 21:27
  • @PrivAcyplease the intentional advantage of this style is that it makes the pipeline structure obvious, especially for commands in the middle. That's IMO well worth the extra chars. – Alois Mahdal Feb 02 '19 at 00:02