2

I want to retrieve the first X and the last Y characters from a string (standard ascii, so no worries about unicode).

I understand that I can do this as seperate actions, i.e. :

FIRST=$(echo foobar | head -c 3)
LAST=$(echo foobar | tail -c 3)
COMBINED= "${FIRST}${LAST}"

But is there a cleaner way to do this ?

I would prefer to use common standard utils (i.e. bash built-ins, sed, awk etc.). At a push, a Perl one-liner is OK, but no Python or anything else.

Little Code
  • 1,315
  • 2
  • 16
  • 37
  • 5
    `combined=${foobar:0:3}${foobar: -3:3}` uses the bash parameter expansions for string-indexing to combine the first and last `3` characters of `foobar` (note: the `space` before `" -3"` is required for offset from the end of the string -- or put it in parenthesis `(-3)`). Don't use `ALLCAPS` variable names, those are reserved for environment variables and bash internal variables (like `BASH_REMATCH`) Example `a=foobar; echo "${a: -3}${a:0:3}"` results in `barfoo` output. – David C. Rankin Feb 10 '22 at 08:08
  • That's very cool @DavidC.Rankin I clearly need to up my game on bash parameter expansions ! – Little Code Feb 10 '22 at 08:55
  • 1
    They are incredibly capable. Just see [man 1 bash](https://www.man7.org/linux/man-pages/man1/bash.1.html) and scroll down to the heading `"Parameter Expansion"` (if you search it, it's about the 4th find down) You can slice and dice any string you need. The benefit -- they are bash-builtins, so there is no wasted spawning of separate subshells calling linux utilities. – David C. Rankin Feb 10 '22 at 09:00
  • As an aside, the space after the equals sign is an error, and [don't use upper case for your private variables](https://stackoverflow.com/questions/673055/correct-bash-and-shell-script-variable-capitalization); see also https://shellcheck.net/ which can diagnose many beginner bugs and antipatterns. – tripleee Jan 22 '23 at 09:08

2 Answers2

2

head + tail two answers, regarding -c switch

head + tail character based (with -c, reducing strings)

Under , you could

string=foobarbaz
echo ${string::3}${string: -3}
foobaz

But to avoid repetion in case of shorter strings:

if ((${#string}>6));then
    echo ${string::3}${string: -3}
else
    echo $string
fi

Full function

shrinkStr(){
    local sep='..' opt OPTIND OPTARG string varname='' paddstr paddchr=' '
    local -i maxlen=40 lhlen=15 rhlen padd=0
    while getopts 'P:l:m:s:v:p' opt; do
        case $opt in 
            l) lhlen=$OPTARG ;;
            m) maxlen=$OPTARG ;;
            p) padd=1 ;;
            P) paddchr=$OPTARG ;;
            s) sep=$OPTARG ;;
            v) varname=$OPTARG ;;
            *) echo Wrong arg.; return 1 ;;
        esac
    done
    rhlen="maxlen-lhlen-${#sep}"
    ((rhlen<1)) && { echo bad lengths; return 1;}
    shift $((OPTIND-1))
    string="$*"
    if ((${#string}>maxlen)) ;then
        string="${string::lhlen}$sep${string: -rhlen}"
    elif ((${#string}<maxlen)) && ((padd));then
        printf -v paddstr '%*s' $((maxlen-${#string})) ''
        string+=${paddstr// /$paddchr}
    fi
    if [[ $varname ]] ;then
        printf -v "$varname" '%s' "$string"
    else
        echo "$string"
    fi
}

Then

shrinkStr -l 4 -m 10 Hello world!
Hell..rld!

shrinkStr -l 2 -m 10 Hello world!
He..world!

shrinkStr -l 3 -m 10 -s '+++' Hello world!
Hel+++rld!

This work even with UTF-8 characters:

cnt=1;for str in Généralités Language Théorème Février 'Hello world!';do
    shrinkStr -l5 -m11 -vOutstr -pP_ "$str"
    printf '  %11d:  |%s|\n' $((cnt++)) "$Outstr"
done
            1:  |Généralités|
            2:  |Language___|
            3:  |Théorème___|
            4:  |Février____|
            5:  |Hello..rld!|

cnt=1;for str in Généralités Language Théorème Février 'Hello world!';do
    shrinkStr -l5 -m10 -vOutstr -pP_ "$str"
    printf '  %11d:  |%s|\n' $((cnt++)) "$Outstr"
done
            1:  |Génér..tés|
            2:  |Language__|
            3:  |Théorème__|
            4:  |Février___|
            5:  |Hello..ld!|

head + tail lines based (without -c, reducing files)

By using only one fork to sed.

Here is a little function I wrote for this:

headTail() {
    local hln=${1:-10} tln=${2:-10} str;
    printf -v str '%*s' $((tln-1)) '';
    sed -ne "1,${hln}{p;\$q};$((hln+1)){${str// /\$!N;}};:a;\$!{N;D;ba};p"
}

Usage:

headTail <head lines> <tail lines>

Both argument default are 10.

In practice:

headTail 3 4 < <(seq 1 1000)
1
2
3
997
998
999
1000

Seem correct. Testing border case (where number of line are smaller than requested):

headTail 1 9 < <(seq 1 3)
1
2
3
headTail 9 1 < <(seq 1 3)
1
2
3

Taking more lines: (I will take 100 first and 100 last lines, but print only 2 Top lines, 4 Middle lines and 2 Bottom lines of headTail's output.):

headTail 100 100 < <(seq 1 2000)|sed -ne '1,2s/^/T /p;99,102s/^/M /p;199,$s/^/B /p'
T 1
T 2
M 99
M 100
M 1901
M 1902
B 1999
B 2000

BUG (limit): Don't use this with 0 as argument!

headTail 0 3 < <(seq 1 2000) 
1
1998
1999
2000
headTail 3 0 < <(seq 1 2000) 
1
2
3
1999
2000

BUG (limit): because of max line length:

headTail 4 32762 <<<Foo\ bar
bash: /bin/sed: Argument list too long

For both this to be supported, function would become:

head + tail lines, using one fork to sed

headTail() {
    local hln=${1:-10} tln=${2:-10} str sedcmd=''
    ((hln>0)) && sedcmd+="1,${hln}{p;\$q};"
    if ((tln>0)) ;then
        printf -v str '%*s' $((tln-1)) ''
        sedcmd+="$((hln+1)){${str// /\$!N;}};:a;\$!{N;D;ba};p;"
    fi
    sed -nf <(echo "$sedcmd")
}

Then

headTail 3 4 < <(seq 1 1000) |xargs
1 2 3 997 998 999 1000
headTail 3 0 < <(seq 1 1000) |xargs
1 2 3
headTail 0 4 < <(seq 1 1000) |xargs
997 998 999 1000

for i in {6..9};do printf " %3d: " $i;headTail 3 4 < <(seq 1 $i) |xargs; done
   6: 1 2 3 4 5 6
   7: 1 2 3 4 5 6 7
   8: 1 2 3 5 6 7 8
   9: 1 2 3 6 7 8 9

Stronger test: With bigger numbers: Reading 500'000 first and 500'000 last lines from an input of 3'000'000 lines:

headTail 500000 500000 < <(seq 1 3000000) | sed -ne '499999,500002p'
499999
500000
2500001
2500002

headTail 5000000 5000000 < <(seq 1 30000000) | sed -ne '4999999,5000002p'
4999999
5000000
25000001
25000002
F. Hauri - Give Up GitHub
  • 64,122
  • 17
  • 116
  • 137
0
$ perl -E '($s, $x, $y) = @ARGV; substr $s, $x, -$y, ""; say $s' abcdefgh 2 3
abfgh

The four argument variant of substr replaces the given portion of the string with the last argument. Here, we replace from position $x to position -$y (negative numbers count from the end of the string), and use an empty string as replacement, i.e. we remove the middle part.

choroba
  • 231,213
  • 25
  • 204
  • 289