7

I often use printf inside shell scripts to make some nice aligned outputs

The problem is, everytime there is an accent (éèà) in the printed string, it shifts the following string 1 step back.

Example :

printf "%-10s %s\n" "toto" "test"
printf "%-10s %s\n" "titi" "test"
printf "%-10s %s\n" "tété" "test"
printf "%-10s %s\n" "toto" "test"

Expected :

toto       test
titi       test
tété       test
toto       test

Got :

toto       test
titi       test
tété     test
toto       test

Does someone have an explanation on this and what can I do to make printf doing it right with special characters?

Thank you for your help

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 2
    You should look at https://stackoverflow.com/q/17368067/5291015 – Inian Nov 25 '20 at 14:29
  • Hint: Compare the outputs of `echo -n "tété" | file -` and `echo -n "titi" | file -` and UTF-8 is made of two bytes – Inian Nov 25 '20 at 14:30
  • See my other answer on the same issue: https://unix.stackexchange.com/a/592479/310674 – Léa Gris Nov 25 '20 at 16:09
  • 1
    You're tagging both "bash" and "shell", but not telling us which specific shell you're using (`/bin/sh` is not guaranteed to be bash at all, and behavior with multi-byte characters differs between specific releases of bash, so we need to know _exactly_ which one you're testing with, and also what your locale settings are to be guaranteed to be able to reproduce the issue). – Charles Duffy Nov 25 '20 at 16:24

3 Answers3

5

Does someone have an explanation on this

é is character encoded with two bytes.

what can I do to make printf doing it right with special characters?

Design your own method of padding that would take into account utf-8s. Ideally I believe a tool like wprintf or making %Ls format specifier call wcwidth() to determine character width or something similar would be welcomed and usefull.

As of now at least my bash when calculating string length takes utf-8 chars into account. You could insert the padding yourself:

printf "%-10s %s\n" "titi" "test";
s="tété";
# (echo -n "$s" | wc -c) is 6 , but ${#s} is 4!
printf "%s%*s %s\n" "$s" "$((10-${#s}))" "" "test"
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
4

Adapted my answer from https://unix.stackexchange.com/a/592479/310674

#!/usr/bin/env bash

align_left(){ printf %s%\*s "${2:0:$1}" $(($1-${#2})) '';}
 
printf '%s %s\n' \
  "$(align_left 10 "toto")" "test" \
  "$(align_left 10 "titi")" "test" \
  "$(align_left 10 "tété")" "test" \
  "$(align_left 10 "têtu")" "test"

Output:

toto       test
titi       test
tété       test
têtu       test
Léa Gris
  • 17,497
  • 4
  • 32
  • 41
2

But you can use other tool to print your report in that manner. Following example uses awk:

echo "toto" | awk '{printf "%-10s test\n", $1}'
echo "tété" | awk '{printf "%-10s test\n", $1}'
echo "titi" | awk '{printf "%-10s test\n", $1}'

EDIT:

The following statement was partially wrong: printf might not be part of bash, but coreutils. Coreutils have a long history with multibyte characters - https://crashcourse.housegordon.org/coreutils-multibyte-support.html.

As noted in a comment by @charles-duffy - printf, in this case, is shell builtin. You can check it with:

[Alex@NormandySR2 ~]$ type printf
printf is a shell builtin

I also agree with the fact that most shell implements their own printf. I checked the following:

  • fish
  • bash
  • zsh
  • tcsh
  • ksh
  • dash
  • oil

All of them uses printf builtin that can differ in details. So my assumption about printf as part of coreutils, in this case, was wrong.

Alex Baranowski
  • 1,014
  • 13
  • 22
  • 1
    `printf` is a builtin in all POSIX-y shells; the coreutils one is only used if explicitly requested (as by running `command printf` instead of `printf`, or using it somewhere like `find -exec` where it's executed as an external binary by necessity). – Charles Duffy Nov 25 '20 at 16:25